Frequency Analysis from the Basics (31) - "Cepstrum"

This time, I'll be talking about cepstrum analysis, a technique frequently used in the field of speech processing.

Cepstrum analysis allows us to obtain a power spectrum from a time waveform, treat that spectral data as a waveform, and perform an FFT on it to determine the periodicity of the spectral waveform.

Furthermore, by using cepstrums, it is possible to estimate the transfer characteristics of a filter system from the time signal passed through the filter system.

Generally, we consider a measured time signal x(t) to consist of two types of time signals, f(t) and g(t), and then we try to extract signal information of either f(t) or g(t) from x(t).

First, as a simple example, let's consider the case where x(t) is the sum of two signals. That is,

x(t) = f (t) + g(t)

In this example, since it is a linear operation, we can perform a Fourier transform on both sides of equation (1).

X ( f ) = F( f ) + G( f )

In other words, even in the frequency domain, the frequency component X(f) is the sum of the frequency components F(f) and G(f) of the two signals.

Here, as shown in Figure 1, if the bandwidths of each frequency component are different, they can be easily separated and extracted by filtering (Figure 2).

Figure 1. Spectrum of sum signal x(t)

Figure 2. Signal separation by filtering process

In reality, signals obtained are rarely this simple; more commonly, they are obtained as the output of a linear system, such as a filter.
Now, if the input signal of a linear system with an impulse response g(t) is f(t) and its output signal is x(t),

Therefore, the output time signal x(t) is the convolution integral of f(t) and g(t). Subtracting both sides of equation (3) from the whole equation...
When you perform a Rie transform...

X ( f ) = F( f )G( f )

In other words, the frequency component X(f) is the product of the frequency components of the two signals, F(f) and G(f). To extract the original input signal f(t) from equation (4), it is necessary to find the inverse filter of G(f), which is quite difficult if g(t) is also unknown. Therefore, if the product in equation (4) can be converted to a sum, the above method of separating by band can be used. Specifically, from equation (4), the power spectrum can be found and its logarithm can be taken to express it as a sum of two components.
From equation (4),

P_xx ( f ) = P _ff( f )P _gg( f )

Here, P _xx ( f ) = P _ff ( f )P _gg ( f ) can be obtained by calculating the power spectra of x(t), f (t), and g(t) respectively, and taking the logarithm of both sides.

log P_xx ( f ) = log P_ff ( f ) + log P_gg ( f )

This results in a summation. Now, using the same reasoning as in equation (1) above, if we consider the left side of equation (6) as time data and perform a Fourier transform, we can perform band separation.

This method, which involves performing a Fourier transform on the original time signal to obtain the power spectrum and then performing a Fourier transform on the logarithmic data, is called cepstrum analysis. The horizontal axis of the cepstrum represents time, and since this is different from the time of a normal time waveform, it is called quefrency, a play on the word frequency. Also, the process of limiting the bandwidth is called a lifter, a play on the word filter on the frequency axis. Table 1 summarizes the comparison between the terminology of frequency spectrum and cepstrum.

Table 1 Comparison of Spectrum and Cepstrum Terminology

name	spectrum	cepstrum
Horizontal axis	frequency	quefrency
Bandwidth separation	filter	Lifter

As a concrete example, let's consider the analysis of sound. The human voice is produced as sound filtered through the vocal tract, such as the throat and oral cavity, using the pulse-like vibrations of the vocal cords as the sound source.

Figure 3: Speech generation model

In Figure 3, the sound source (a) is f(t), the spectrum of the vocal tract (b) corresponds to g(t), and the generated sound is g(t).
(c) corresponds to the observed time signal x(t).

Figure 4. Time waveform and power spectrum of a female voice (vertical axis is logarithmic).

Figure 4 shows the time-time signal of an adult female voice ("a") and its analysis results. From this, we can see that the fundamental frequency (pitch) of the voice ("a") is approximately 225 Hz. Further FFT analysis of the power spectrum (logarithmic) in the lower part of Figure 4 yields the cepstrum, which is shown in Figure 5.

Figure 5 Female voice / a / no cepstrum

The horizontal axis in Figure 5 is in kefrenci, with the unit being time (s).
From this figure, we can see the following: The cefrensi band higher than the red dotted line in the figure (approximately 3.5 ms) is the cepstrum component corresponding to the sound source from vocal cord vibration, and represents the periodicity of the original spectrum (repetition of the fundamental frequency and its harmonic components). The peak value is approximately 450 ms, and the interval period of the spectrum is approximately 450 ms, so the reciprocal is approximately 220 Hz, indicating that it contains information about the fundamental frequency of 220 Hz and its harmonic components.
Furthermore, the kefrensi band below the red dotted line in the figure (approximately 3.5 ms) represents information about slow changes on the original frequency axis, and therefore represents the spectrum of the vocal tract.

Next, lifting (low-pass lifting) is performed by setting the high-gap frequency to zero around the red dotted line (approximately 3.5 ms), and then FFT is performed again to return to the frequency axis, resulting in the upper part of Figure 6. This is called the lifted spectrum and corresponds to the envelope of the original power spectrum. Figure 7 shows the original power spectrum and the lifted spectrum superimposed.

Thus, applications of cepstrum analysis include extracting periodic information from spectra and determining the spectral envelope. Other applications include separating reflected sound waves.

Figure 6. Lifted spectrum and original power spectrum

Figure 7: Overlay display of the lifted spectrum and the original power spectrum.

Finally, here's a summary.

The cepstrum is obtained by treating the logarithmic power spectrum as a time signal and then performing a Fourier transform on it.
Since the cepstrum is obtained by Fourier transforming a spectrum where the horizontal axis is frequency, the horizontal axis returns to the time axis (unit: s), and it is called the cefrenci.
By using cepstrum analysis for speech, it is possible to detect the fundamental frequency of the vocal cords and the spectral shape of the vocal tract.
When the low-keflensy spectrum is lifted and then the Fourier transform is performed again, the envelope of the original spectrum is obtained, which is called the lifted spectrum.
The uses of cepstrum include:
① Extraction of periodic information from the spectrum
② Detection of the spectral envelope
③ Separation of reflected sound waves, etc.
These are some examples.

keyword
Cepstrum, impulse response, convolution integral, cefrensi, lifter, fundamental frequency, pitch, low cefrensi, high cefrensi, lifted spectrum, envelope

【keyword】

Cepstrum, impulse response, convolution integral, quefrency, lifter, fundamental frequency (pitch), lifted spectrum, envelope

【reference】
Iwao Morishita and Hidefumi Obata, "Signal Processing," Society of Instrument and Control Engineers (1982).
Kenichi Kido, "Digital Fourier Analysis (II) - Advanced Edition," Corona Publishing Co., Ltd. (2007)

(Excerpt from the email newsletter issued on January 24, 2017)

Newsletter Signup

We provide the latest information and helpful tips about our products and services.