3. The Music Detection Algorithm
The music detection algorithm is a new function introduced in Annex C+ to support the integration of Annex E.
3.1. Purpose and Operation
The algorithm’s primary role is to override the VAD’s “non-speech” decision when it detects the presence of music. This capability is active only when the integrated G.729 coder is operating at 11.8 kbit/s (Annex E). The function is called continuously to keep its internal states current, but it can only alter a VAD decision from “non-speech” to “speech,” not the other way around.
3.2. Algorithmic Process
The algorithm consists of two main parts: the computation of relevant signal parameters and a final classification based on these parameters.
- Computation of Relevant Parameters: The algorithm analyzes the input signal using parameters derived from the LPC analysis and VAD modules. Key computed metrics include:
- Partial Normalized Residual Energy
- Spectral Difference and Running Mean of Background Noise
- Pitch Lag Standard Deviation
- Running Mean of Pitch Gain
- Pitch Lag Smoothness and Voicing Strength Indicator (Pflag)
- Use of Stationary Counters: To track the signal’s characteristics over time, a set of counters is defined and updated. These counters track the number of consecutive frames that exhibit certain properties, such as high stationarity in reflection coefficients, use of backward adaptive LPC, or a strong voicing indicator. Examples include count_music, mcount_music (running mean of count_music), count_pflag, and mcount_pflag.
- Classification: Based on the computed parameters and counters, a final classification logic is applied. If the signal characteristics satisfy a set of predefined thresholds (e.g., for spectral difference, energy levels, and stationarity), the VAD decision (Vad_deci) is reverted from “non-speech” to “speech.”