2.0 Key Functional Enhancements in Annex C+
To accommodate the integration of different annexes, Annex C+ introduces several modifications and additions to the G.729 algorithm. These enhancements were essential for managing the distinct operational requirements of variable bit rates and discontinuous transmission, thereby improving the overall versatility and efficiency of the codec.
2.1 Integration of Discontinuous Transmission (DTX) with Annex D
The integration of Annex B functionality (which specifies Discontinuous Transmission, or DTX) with the 6.4 kbit/s operation of Annex D is described as “straightforward”. The core parameters for managing periods of silence in speech—namely Voice Activity Detection (VAD), Silence Description (SID), and Comfort Noise Generation (CNG)—are simply reused from Annex D’s existing specification. This direct reuse simplifies the implementation logic when operating at the lower bit rate.
2.2 Advanced Integration with Annex E
In contrast, the integration with the higher bit rate Annex E is characterized as “more involved”. To preserve audio quality at the 11.8 kbit/s rate, a more sophisticated approach is required. The Voice Activity Detection (VAD) function is performed after the 10th order forward adaptive Linear Predictive Coding (LPC) analysis but before the backward adaptive LPC analysis of Annex E. This specific sequencing is critical for maintaining the high performance and speech quality expected during Annex E operation.
2.3 Introduction of a Music Detection Algorithm
A significant new function introduced in Annex C+ is the music detection algorithm. This module was specifically developed to support integration with Annex E, as the main G.729 body and Annex B had no strict requirements for performance with music signals. This addition guarantees quality when the input signal contains music, for which the core algorithm is not optimized. Key characteristics of this function include:
- It is active only during Annex E operation.
- Its Voice Activity Detection (VAD) decision is updated continuously to adapt to signal characteristics.
- It is designed to only change a VAD decision from ‘non-speech’ to ‘speech’, and not vice versa, providing a conservative mechanism for identifying musical content.
These technical enhancements were formally defined and approved through the ITU-T’s rigorous standardization process.