RWTH Aachen
University
Institute for Communication
Systems and Data Processing
Skip to content
Direkt zur Navigation
Home
  • Deutsch
  • English
Home

Near-End Listening Enhancement

Mobile communication is often conducted in the presence of acoustical background noise, which leads to two major problems:

  1. The noise is recorded by the microphone along with the speech and transmitted over the telephone network to the far-end listener. Several preprocessing algorithms have been proposed to reduce the noise in the near-end microphone signal before speech coding and transmission in order to regain speech intelligibility for the far-end listener.
  2. The near-end listener also experiences an increased listening effort and a possibly reduced speech intelligibility since he is located in the noisy environment and perceives a mixture of the clean far-end (downlink) speech and the acoustical background noise as illustrated by the figure.
  • Figure: The problem of near-end listening enhancement.

For the problem of near-end listening enhancement, as opposed to the problem of noise reduction, the noise signal cannot be influenced because the near-end listener is located in the noisy environment and the noise reaches the ears with hardly any possibility to intercept. Therefore a reasonable option to improve intelligibility by digital signal processing is to manipulate the clean far-end speech signal depending on the local acoustical background noise.

In the 1960’s and 1970’s some research has been done on this topic. Niederjohn et al. for example proposed a high pass filtering to enhance the higher formants followed by a rapid amplitude compression to defend white noise and a power generating noise environment, respectively.

In [sauert09], we derived a near-end listening enhancement algorithm which maximizes the Speech Intelligibility Index (SII) and thus speech intelligibility by frequency selective increase of the speech signal power. This algorithm obviously raises the speech signal power.

However, in some applications, the overall power of the loudspeaker signal is constrained to a maximum overall power, e. g., the power of the original signal or a constant maximum power. Especially, for small loudspeakers as used in mobile phones the thermal load during continuous playback is one major limitation. These applications are considered in [sauert10a] and [sauert11], where a recursive closed-form solution optimization of the spectral speech signal power allocation is derived which maximized the SII under this constraint.

The modification of the far-end speech can be implemented with a common DFT analysis-synthesis filterbank or a filterbank equalizer either with uniform or non-uniform spectral resolution. The latter frequency warped filterbank equalizer performs time-domain filtering with the filter coefficients adapted in the frequency domain and allows for processing with approximately Bark-scaled spectral resolution according to the human auditory system and low signal delay. In [sauert08], we compared these implementation structures based on an early version of the algorithm.

Besides mobile telephony in handset as well as hands-free mode, near-end listening enhancement can further be applied in

  • public announcement systems,
  • digital hearing aids,
  • car radios, and
  • in-car communication systems.