Learn the fundamentals of digital signal processing theory and discover the myriad ways DSP makes everyday life more productive and fun.

Loading...

来自 洛桑联邦理工学院 的课程

数字信号处理

241 评分

Learn the fundamentals of digital signal processing theory and discover the myriad ways DSP makes everyday life more productive and fun.

从本节课中

Module 3: Part 1 - Basics of Fourier Analysis

- Paolo PrandoniLecturer

School of Computer and Communication Science - Martin VetterliProfessor

School of Computer and Communication Sciences

Spectrograms are particularly useful and particularly popular in speech analysis.

Speech is a particularly difficult signal to analyze because the mechanism of speech

production alternates between widely different modes of operation.

When you pronounce a vowel sound like aah for

instance, you're producing a harmonic sound that resonates in your body.

And that contains a very well structured harmonic content.

On the other hand, consonants have a noise-like structure mostly, and

their spectral makeup is completely different.

So, it would be futile to try and come up with global spectral representation for

an entire speech utterance.

And we need to split the speech into pieces, and

analyze these pieces in sequence.

So here's an example.

This is a sentence from a speech corpus

that is used in speech analysis algorithms.

>> There is a lag between thought and act.

>> So, in the time domain, the wave form can be split like this.

This part here is, there is a lag.

This is between thought and act.

So here we are in the presence of a portion of speech that is rich in vowels.

And here we have single words that have both vowels and consonants.

In particular, for instance, look at this.

This act.

And so we have a and then c and t, which are noise-like pulses.

The speech signal was sampled at a kilohertz and let's try and

see what a wideband spectrogram can tell us about the structure of this utterance.

So if we take an eight milliseconds analysis window, which gives us

frequency bins of 125 Hertz, which is quite wide, we get a plot like this.

We cannot see much in terms of frequency resolution,

although we start to see some patterns here that will be clearer later.

But what we do have is very precise onsets for the consonants.

Remember, this is act, and here you see the beginning of the C, and here you

see the beginning of t in this onset of a high energy band in the spectrogram.

And the same could be said for the other consonant.

If we use a narrow band spectrogram.

So we increase the analysis window to 32 milliseconds.

Now we have a resolution of 31 Hz.

And here we see the harmonic structure of speech.

Here the focus is on the vowels.

And here you can see that each vowel contains

a harmonic structure that depends on the pitch of the valve.

And it will change from speaker to speaker,

from male speaker to female speaker.

You can also see how in speaking, we modulate and

change the frequency of the vowels to give a certain intonation to our utterance.

So narrow band spectrogram will give us

information on the harmonic parts of speech.

Wide band spectrogram will give us information on the pulse-like and

noise-like consonant sounds in speech.

The short time Fourier transform determines a tiling

of the time-frequency plane where the size of each tile is

specified by the time and the frequency resolution of the STFD.

Suppose we choose a window size equal to 20, what we'd have is

a subdivision of the time axis into chunks that are 20 samples long, and

a subdivision of the frequency axis into bins, each on of which is 2 pi over 20.

So, each tile in the frequency plane will have a horizontal size of 20 and

a vertical size of 2pi/20.

And everything that happens in the time-frequency plane within this tile

will be summed up by just one ST FT value.

If we change the size of the window, suppose we take L equal to ten,

then we narrow the size of the tile in the time axis,

but we widen the size of the tile in the frequency axis.

So although the shape of the tiles change, the number of tiles remains the same,

because the area of each tile remains constant.

Similarly, if we shorten the window even more, we have

a different arrangement of the tiles but the size of each tile remains the same.

This is actually quite self-evident if the time resolution is L,

the frequency resolution is 2pi over L and

therefore the product, namely the area of each tile is the constant 2pi.

In a nutshell, this is the uncertainty principle in time frequency analysis.

It states that we cannot arbitrarily narrow our focus both in time and

in frequency.

If we want a higher time resolution we will necessarily have to give up frequency

resolution and vice versa.

The short time Fourier transform leads to a very simple

uniform tiling of the time frequency plane.

And more sophisticated structures have been the subject of much research and

in particular of a branch of signal processing called wavelet analysis.

For those of you who are curious about wavelets, we recommend you check out

the links that we provide in the bibliography for the class.