0:00

Welcome again to the course on Audio signal processing for music applications.

Last week, we talked about the short time Fourier transfer.

That offered a sound representation from which we can synthesize

sounds without losing any information.

And at the same time, it's a good tool for understanding, describing,

and transforming sound.

This week, we go a step further in the direction of obtaining a higher level

representation that in exchange of losing a bit in terms of the identity properties

of the STFT, we gain quite a lot of flexibility and

level of obstruction in the representation.

This is what we call the sinusoidal model.

And we will cover this topic in three theory lectures.

So this is the first one.

We will first present the model, the sinusoidal model.

Then talk about how these sinusoids can be expressed in the spectrum

1:03

So the model is quite simple, it's just a sum of time varying sinusoids.

So this equation, we have seen it before, but here we are emphasizing two aspects.

One, is the idea of summing a finite number of sinusoids, R.

So we have R sinusoids, and each of these sinusoids is time varying.

It has an instantaneous and a frequency value that changes in time.

1:42

So let’s see how that looks like.

Again, we have seen these equation before.

So, if we start from a signal x, that is real sinewave and

then we take the DFT of the windowed version of this sinewave,

we see that of course, the sine wave can be expressed as the sum

of two complex sinewaves that are then multiplied by the window we use.

And being the sum of two exponential sinewaves,

we can split that into two summatory, so separate DFTs.

So we'll have the DFT of the negative frequency and

the DFT of the positive frequency.

Each one, again, multiplied by a window.

And these complex exponentials can be grouped together.

And basically this is the DFT of a shifted version of the transform of the window.

Okay, so basically, at the end we see that

the first summatory is the DFT of the function W.

So it's W and the frequency index is shifted, so

we have shifted the window and is a scale by the amplitude

of the cosine, by half of the amplitude of the cosine.

And the other element is the same window, but

shifted by the positive frequency, and

also scaled by the same amplitude.

So if we start from a sinewave and we want to show it that the plot

of one single spectrum of this window sinusoid, we can see it like this.

So this is the positive spectrum.

So we don't see the two windows, we only see the positive one.

So we are seeing the positive frequencies and

so the contribution of the positive exponential.

And we see the shape of the window that we use,

but of course, centered at the frequency of the sinusoid, which is 440 hertz,

which of course we can listen to the sinewave.

[SOUND] And this is its spectrum.

So a peak centered at 440 hertz.

And the phase that during the main lobe is flat

that corresponds to the phase of that sinewave at location zero.

4:20

Let's make it a little bit more difficult.

What happens when our sound is made up of two sinewaves?

So these are two sinewaves, one at 440 hertz,

the other at 490 hertz, together.

And we can also listen to that.

[SOUND] Okay, so, clearly it sounds like a modulated signal.

And in the time domain, we can see these modulations.

So we see the low frequency which is the modulation, and the high frequency.

4:54

And if we compute the spectrum of that, the positive part of that,

we are seeing the two contributions of the two sinusoids.

So, we see the two peaks of the two sinusoids, and

in the phase we see the phases of these two sinusoids.

5:15

And now let's show an example of a real sound.

A sound that includes many sinewaves, like the sound of an oboe.

Let's listen to the oboe first.

[SOUND] Okay, so this is an oboe playing four notes,

so it is around, fundamental around 440 hertz,

and in the spectrum, we clearly see all these

sinusoids which are the harmonics of the sound.

Here, we're only plotting the first 4000 hertz so

we're only seeing the first few harmonics, this sound has many more harmonics.

But this a good way to zoom into these shapes of the windows and

also we see the phase spectrum of that.

But, how do we detect the frequency, amplitude,

and phase of each of these sinewaves?

A simple way to identify a sinusoid in the spectrum is by just focusing

on the spectrum magnitude on its location, and on its height.

So the location is the frequency and the height is the amplitude of the sinusoid.

So therefore, we consider a sinusoid as a peak in the magnitude spectrum.

And of course, the issue is that the resolution of

a magnitude spectrum is discreet, it's finite.

And the maximum resolution we'll be able to get is half of

the distance between two frequency samples, between two bins.

So that's the maximum frequency resolution that we will get in measuring a sinusoid.

7:15

So we'll be able to do zero-padding to get a bigger FFT so that we get more samples.

And we can also do interpolation directly on the resulting samples to even

refind the value of the frequency and amplitude values.

To detect the spectral peaks, we have to understand the effect of the window.

And the most important factor is the window size.

So, if we have a particular window, and one important

concept is the bandwidth of the main lobe in the spectral domain.

So the bandwidth of the main-lobe is B sub f expressed in hertz,

so that would be the main-lobe bandwidth of the window in hertz and

that's define as the product of B sub s.

So the main-lobe width of the window expressed in samples,

multiply by the sampling rate and divided by the window size.

8:20

And so that will be the width of the main-lobe.

And then, if we consider a particular delta of

the distance between two frequencies that we want to resolve.

So we have two frequencies, f sub k plus one, and f sub k.

So the absolute value of the difference

is the delta frequency that we want to resolve.

8:44

So what would be the window size, M, so

that two main-lobes of the window are these joints, so

that we can see these two frequencies as separate peaks in the spectrum.

So this equation here shows what has to happen.

So M has to be bigger or equal than B sub s,

of the number of samples of the window in the main-lobe,

multiply by the sampling rate and divide it by this delta.

Or we can also change this delta as the absolute value of

the difference of these two frequencies.

9:25

But in many cases this difference between the two frequencies corresponds to

the fundamental frequency, because if it's a harmonic sound, the distance

between two consecutive harmonics is equal to the fundamental frequency.

So if we consider the fundamental frequency,

this delta that we want to be able to discriminate,

then the bandwidth in hertz of the main loop of the window has to be smaller or

equal than this fundamental frequency.

So we see these lobes separate and therefore N,

the window size, will have to be bigger or

equal than B sub s multiplied by F sub s, divided by F sub 0.

Or, if we express the period instead of the fundamental frequency,

we express the cycle length as the period in sample,

then this M has to be bigger equal than B sub s multiplied by the period,

the period of the harmonic sound expressed in samples, which is this P.

So, let's show an example, let's start from a given window,

like the Hamming window, that B sub s is equal to 4.

So the main-lobe width is equal to 4.

And we have a given sampling rate, and

we have two particular frequencies that we want to distinguish.

The ones we showed before, 440 hertz and 490 hertz, so the difference

is this 50 hertz.

11:06

So we can calculate M that allows us to distinguish these two frequencies.

So M will have to be bigger or equal than B sub s 4 multiplied

by sampling rate 44,100, and divided by this difference.

The absolute value of this difference which is going to be this 50 hertz.

And that M will be 3,528 samples.

So, if we take 3528 samples of this signal, and

we compute the DFT of, with some zero padding so

that we see a smooth spectrum, we see this magnitude spectrum.

Clearly, we see two clearly distinct peaks,

each one corresponding to the transform of a Hamming window, and

of course in the phase spectrum, we also see the corresponding phases.

12:15

which is 440 hertz, what should be the N?

So if we take an N of 401 samples, these 401 samples

basically corresponds to four periods of this oboe sound, okay?

And then, so, if we compute the DFT of this

signal multiplied by a Hamming window, and

again with a zero padding to get an N equal to 1,024,

we see these harmonics quite clearly separated.

So each harmonic corresponds to one main-lobe of this Hamming window and

they're quite clearly one after the other.

But now let's see if we increase this window size,

instead of having this 401 samples,

we have twice as many, so we have 801 samples.

And then we do the same thing, we apply the Hamming window and

then we take the FFT, which is larger, we see this spectrum.

And so here, because we took more samples,

the distance that now we can discriminate, is larger.

So, in fact, we see that the main lobes are none other than what we ever need,

that is we even see the side lobes in between

the two main lobes because we took a bigger window size and

therefore we are able to discriminate even more than the fundamentals frequency.

14:00

On the topics covered until now, there were quite a bit of references, but

starting from this letter on, the techniques are more specific

to music applications and quite a bit less has been published.

It is good for me, for the course,

you'll have to pay more attention to what I'm going to be talking about.

14:23

Anyway, so apart from the standard references,

in Wikipedia you can find a little bit about it.

And of course, again on Julius references, you can find quite a bit more and

more in-depth discussion about these things.

14:42

And that's all for this lecture.

We have presented the sinusoidal model, a sound representation that can be built on

top of the short time Fourier transform and

that can reduce the amount of spectral information to be considered.

However, to use it, we have to understand a bit about spectra and about windows.

Hopefully, you understood some of that in this lecture.