Spectral weighting basically means that different spectral regions of the mixed signal of speech and noise are attenuated with different factors. The aim of this process is an audio signal which contains less noise than the original one. Besides requiring a minimal distortion of the original speech, it is also important that the residual noise, i.e. the noise remaining in the processed signal, does not sound unnatural.
The spectral weighting is usually performed in a transformed domain, e.g. the frequency domain. A common transform is the Fourier transform which provides an equidistant frequency solution.
With the assumption that the speech signal s(k) and the noise signal n(k) interfere additive with each other, we get for the (microphone) input signal signal x(k)
x(k) = s(k) + n(k).
After segmentation and windowing this equation leads to
X(f) = S(f) + N(f)
in the frequency domain. The actual spectral weighting is now performed by multiplying the spectrum X(f) with a weighting function G(f) (see next page for more details). We call G(f) a weighting function or weighting rule. The result Y(f) is then given by
Y(f) = X(f) * G(f).
The weighting function G(f) is usually a function of the spectrum X(f) and of the noise power spectral density (PSD) Rnn(f). Thus, to calculate G(f) some estimate of
the noise which should be reduced is necessary. Basically, two methods exist for estimating the noise:
Finally the output signal y(k) of the system is obtained by transforming Y(f) back into the time domain and applying overlap-add. The total system is depicted in the following block diagram.
