ITU-T G.729 Annex C+ Floating-Point Implementation

Curriculum

7 Sections
32 Lessons
10 Weeks

Expand all sectionsCollapse all sections

Introduction
1
- 1.1
  Introduction
Demystifying Digital Speech: A Beginner's Guide to ITU-T G.729 and CS-ACELP
6
The Lifecycle of a Global Standard: An Introduction to the ITU-T Process
6
Briefing Document: ITU-T Recommendation G.729 Annex C+
5
Technical Specification: ITU-T Recommendation G.729 Annex C+
6
Briefing Note: ITU-T Recommendation G.729 Annex C+ and Corrigendum 1
4
Study Guide for ITU-T Recommendation G.729 Annex C+
4

3. The Music Detection Algorithm

The music detection algorithm is a new function introduced in Annex C+ to support the integration of Annex E.

3.1. Purpose and Operation

The algorithm’s primary role is to override the VAD’s “non-speech” decision when it detects the presence of music. This capability is active only when the integrated G.729 coder is operating at 11.8 kbit/s (Annex E). The function is called continuously to keep its internal states current, but it can only alter a VAD decision from “non-speech” to “speech,” not the other way around.

3.2. Algorithmic Process

The algorithm consists of two main parts: the computation of relevant signal parameters and a final classification based on these parameters.

Computation of Relevant Parameters: The algorithm analyzes the input signal using parameters derived from the LPC analysis and VAD modules. Key computed metrics include:

Partial Normalized Residual Energy
Spectral Difference and Running Mean of Background Noise
Pitch Lag Standard Deviation
Running Mean of Pitch Gain
Pitch Lag Smoothness and Voicing Strength Indicator (Pflag)

Use of Stationary Counters: To track the signal’s characteristics over time, a set of counters is defined and updated. These counters track the number of consecutive frames that exhibit certain properties, such as high stationarity in reflection coefficients, use of backward adaptive LPC, or a strong voicing indicator. Examples include count_music, mcount_music (running mean of count_music), count_pflag, and mcount_pflag.
Classification: Based on the computed parameters and counters, a final classification logic is applied. If the signal characteristics satisfy a set of predefined thresholds (e.g., for spectral difference, energy levels, and stationarity), the VAD decision (Vad_deci) is reverted from “non-speech” to “speech.”

2. Integrated Functionalities and Annexes

ITU-T G.729 Annex C+ Floating-Point Implementation

ITU-T G.729 Annex C+ Floating-Point Implementation

Curriculum

3. The Music Detection Algorithm

Modal title