Research in the area of speech coding and contributions to related standardization activities are traditionally one of the core competences at the IND. Recently, certain aspects of audio coding attracted attention as well.
In digital voice communications or storage systems, sampling and quantization of an analogue speech signal normally yields a digital representation as a PCM signal. In ISDN for example, telephone band (0.3-3.4 kHz) speech requires a sampling rate of 8 kHz. For a faithful reconstruction, an A-law quantization by 8 bit per sample is used, leading to an initial bit rate of 64 kbit/s.
For the bit rate efficient transmission or storage of speech signals, however, further compression is often necessary, e.g., in mobile communications. Speech coding algorithms may exploit both redundancies and irrelevancies of the signal. Herein, a reconstruction with the best possible (subjective) quality must be maintained, even in case of transmission over noisy channels.
The design of a speech coding algorithm is determined by several major requirements:
For a given application, a trade-off between these often contradicting requirements has to be found. Particularly, in (public) voice communications systems, the use of standardized coding algorithms is crucial to ensure the interoperability between different products. Therefore, speech coding research is often related to international standardization activities in the scope of, e.g., ITU-T, ETSI/3GPP, or ISO-MPEG.
In comparison to audio signals, speech signals can be characterized by a rather low analogue bandwidth and by particular model assumptions that may be used during the design of the coding algorithm. In standard communications applications, a telephone bandwidth of 0.3 ... 3.4 kHz allows a digital representation at a sampling frequency of 8 kHz. General audio signals, e.g. music, have a bandwidth of about 15-20 kHz and thus require a sampling frequency of 32 ... 48 kHz. In between, wideband speech signals (bandwidth 7 kHz, sampling rate 16 kHz) have been attracting an increasing interest with reference to high quality applications, such as VoIP telephony or videoconferencing services. Currently, the deployment of wideband speech coding is also being considered for cellular networks such as GSM or UMTS.
Most of the known speech coding algorithms are explicitly based on a model of speech production. At low and medium bit rates (about 0.5 ... 2 bits per sample, i.e. 4 ... 16 kbit/s at a sampling rate of 8 kHz), properties of the human ear are used, too. In contrast, audio coding algorithms (see below) can not benefit from general models of the signal source, but widely incorporate perception models, in particular due the masking properties of the ear.