RWTH Aachen
University
Institute for Communication
Systems and Data Processing
Skip to content
Direkt zur Navigation
Home
  • Deutsch
  • English
Home

Voice over IP – Speech Coding & Transmission Protocols

Speech Coding for Voice over IP

For Voice over IP transmission, the speech signal is split into frames of usually 20-30ms length. Depending on the available transmission bandwidth, i.e. the data rate on the access and core network, each speech frame is first processed by a respective speech codec for compression. Commonly, these codecs encode single speech frames at data rates of, e.g., 5.6 kbit/s (ITU-T G.723.1), 8 kbit/s (ITU-T G.729), 12.2 kbit/s (GSM-EFR), and up to 64 kbit/s (ITU-T G.711) in case of uncoded transmission. Besides these narrow-band speech codecs (300 Hz - 3.4 kHz audio bandwidth), the use of wide-band speech codecs (50 Hz - 7 kHz audio bandwidth) with superior speech quality is of particular interest for VoIP. Due to the flexible IP transmission, such codecs can be easily introduced without the need of changing the network's infrastructure.

The IND has contributed to a new codec standard [ITU-T G.729.1] for application in heterogeneous packet networks with different access data rates. It is based on the principle of scalable coding, providing narrow-band speech quality at lower data rates and wide-band quality at higher data rates.

Transmission Protocols

The encoded speech frames then encapsulated by RTP/UDP/IP protocols for transmission in the IP network. Because of the real-time constraints, retransmissions are in general not feasible for real-time multimedia transmission in packet networks. Therefore, VoIP utilizes the RTP/UDP protocols instead of TCP.

The protocol headers of RTP/UDP/IPv4 together allocate 12+8+20=40 byte of a single packet, which results in a data rate of, e.g., 16 kbit/s only for header information if a packet is sent every 20ms. Depending on the used speech codec, the total data rate of a VoIP call is then about 22-80 kbit/s for each direction, including 20-73% header information. This considerable protocol overhead leads to a low data rate efficiency of VoIP, which becomes an issue when you want to transmit a VoIP call over low rate wireless links. In this case, the application of header compression technologies (e.g. ROHC) may be necessary.

Depending on the available data rate and the tolerable delay, the number of frames per packet may be varied, and redundant information may be added for a higher robustness against transmission errors.