Learn the fundamentals of digital signal processing theory and discover the myriad ways DSP makes everyday life more productive and fun.

Loading...

来自 洛桑联邦理工学院 的课程

数字信号处理

241 评分

Learn the fundamentals of digital signal processing theory and discover the myriad ways DSP makes everyday life more productive and fun.

从本节课中

Module 3: Part 1 - Basics of Fourier Analysis

- Paolo PrandoniLecturer

School of Computer and Communication Science - Martin VetterliProfessor

School of Computer and Communication Sciences

The spectrogram is a clever way of

showing this time varying spectral information in one single plot.

If you think about it, the short time Fourier transform

is a complex valued function of two variables, m and k.

And so to plot it properly we would need a four dimensional plot,

which of course is not possible.

We can restrict ourselves the magnitude of the DFT, at which point, the STFT becomes

a real valued function of two variables, which requires a three dimensional plot.

Now, this is not only quite hard to do, but also rather difficult to interpret.

To make it easier to understand, what we do instead is color-code

the magnitude of the Fourier transform and we use dark color, dark hues for

small values and whitish or brilliant hues for large values.

We also take the logarithm of the magnitude in order to compress the range

of values that are associated to magnitude and to better map them over a color scale.

And we put the spectral slices one after another

to obtain an image like picture of the time variance spectrum.

So this is the spectrogram of the DTMF signal.

On the horizontal axis we put the variable m.

So the starting point for each spectral slice, here on the vertical axis,

we put the D.F.T. coefficient.

We have a real signal so

we just go from zero to L over two where L is the size of our DFT window.

You can see here that the black pixels in the picture correspond to vary small

values for the DFT.

So these black areas indicate the silence regions in the DTMF signal.

At the same time the bright bands here

correspond to high values of the DFT coefficients.

So these are actually the frequencies in each digit being dialed.

And so in the plot, we have shown at the same time, both the time information,

we have a good estimation of where each digit begins and ends and

of the frequency content that is associated to each digit.

So we can read this picture and find out that the digits were 1-5-9 in sequence.

If we know the system clock for the signal or

the sampler rate, we can label the axis just in the same way we did for the DFD.

So remember the highest positive frequency is

Fs over 2 where Fs is the sampler rate of the signal.

The frequency resolution, how fine a frequency we can resolve in a DFT

will be given by Fs over L and the width of the time slice.

So the time resolution is l times T s seconds where T s is one over F s.

So if we apply this to the DTMF signal,

which was sampled at eight kilohertz, we can label the axis like so.

We have a maximum frequency of 4kHz and

a total duration of the signal of 2.1s.

The natural questions that should come to mind,

at this point, are, what about the width of the analysis window.

We chose 256 points.

Why?

Is it the optimal size?

What happens if we take a larger window?

What happens if we take a smaller window?

How do we position these windows along the signal?

Do they overlap, and if so, by how much?

And what is the shape of the window that we should use?

By shape here, I mean the following.

Here we're taking chunks of L samples and just taking a DFT of the raw data.

Now suppose my signal is a very smooth signal over this window

that goes like this.

So this is a smooth signal and

my DFT should have basically just low frequency coefficient.

But now remember that to the DFT, everything is a periodic sequence.

So what the DFT really sees is something like this.

Now here, we have all of a sudden a big jump at this continuity.

And this will create spurious high frequency content in the DFT coefficients.

We can counteract this side effect

by taking the raw data in a chunk and using a tapering window.

So for instance, suppose we have a tapering window shaped like a triangle,

we multiply each sample by the value of the window.

And we will have a signal that is pretty much identical to the original data

in the middle of the window, but then it tapers to zero at the extremities.

And so at the end you will get something without jumps

in the periodized version like so.

So, the whole story is that we could really spend weeks talking about

all the tricks and tweaks that we can apply to a spectrogram in order to extract

some kind of information from a real-world signal.

But since we don't have that kind of time here, we will just talk about the main

trade-off, which is related to the size of the analysis window.

Spectrograms can be either wide band or

narrow band according to the frequency resolution of the associated DFT.

So if we choose a long window, if our L is big,

in that case we have a narrow band spectrogram.

Why is that?

Because a long window will give us more DFT points and

therefore more frequency resolution.

Remember the frequency resolution in the end is equal to the sampling frequency

divided by L or 2 Pi divided by L if we remain in the abstract discrete time.

However, in a long window, more things can happen.

And so we have less precision in the time resolution.

In the limit, a long window is the DFT of the entire signal.

And we have seen that completely obliterates the time information.

Conversely, if choose a short window, then we have a wide band spectrogram.

A short window will create many time slices because you will divide

the whole support of the signal into more chunks.

And so we have a much more precise location of the transitions.

But a short window will give us fewer DFT points, so

the frequency resolution will be poor.

So let's use our DTMF signal once again and

look at the difference between a wide band and narrow band spectrogram.

Here is an example of wide band spectrogram

where the analysis window is just 32 samples.

With such a short analysis window, we have a very good localization in time of

the start and stop point for each burst of sound.

But you can see that the frequency bands are extremely wide.

Also having a very short window creates artifacts in the high frequency range

because as we sweep the window over this signal, we will be encompassing

uneven numbers of periods for the underlying sinusoids.

This is a spectrogram that we saw in the beginning.

The window now is 256 point.

So let's say that this spectrogram is in between an extremely narrow band and

an extremely wide band.

This is a very good compromise to interpret what's going on in the signal.

If we increase the size of the window to 1024,

so four times larger, then we have an extremely narrowband spectrogram.

You see now that the frequencies are localized extremely precisely, but

on the other hand, the time resolution is very poor.

We're completely missing the silence here in the beginning for instance and

the silence between these two digits.