0:00

Hello, welcome back to the course on audio signal processing for music applications.

This week, we're talking about the sinusoidal model and

we are considering sinusoids in the frequency domain, in the spectrum.

And more particularly, we are considering that a sinusoid is a peak, spectral peak.

And from which we can measure the values of the sinusoid.

So this is what we want to go over in this programming lecture

from a programming perspective.

Therefore, developing some code that would allow us to measure

the values of the sinusoid.

And these are the equations we presented in the lecture class to measure

the frequency, the amplitude, and phase of a sinusoid using parabolic interpolation.

So this first equation is the way to

compute a parabola on the tip of a peak.

And obtaining the refined frequency value as this

is as allocation from zero to n which is an FFT size.

Then we can obtain the frequency value in hertz by

multiplying by the sampling rate and divided by capital N.

And then we can plot the location value in the parabola and

obtain the tip of it, which will be the interpolated amplitude value.

And finally we can also get the phase value by

reading the location of the frequency.

Maybe using a linear interpolation of that in the phase spectrum, so

that we get a refined phase value.

1:55

And in here I wrote some code to actually do this detection of a spectral peak.

So here I first imported,

some of the packages that I need like the numpy and

matplotlib, the window from scipy.

Then I go to append the directory from which

I have some code that I'm going to use.

Is specifically the DFT model so that our DFT implementation and

set of utility functions that I have in the SNS tools package.

Okay, after I have all these packages, I can start reading a sound file.

So I read the sound file that includes a sinewave

of 440 hertz so I know exactly what frequency I should get.

3:15

And t is threshold that I will be using for the peak detection.

So then I can get the window, I get a humming window of size M.

Then I read into the file, in the middle of the sound file, so

I can just get M, capital M samples of the middle of the sound.

And then I can compute the spectrum using the DFT I return the magnitude and

phase, and then I am computing the peak.

So this peak detection, this is a function that is declared in the utility functions.

So let's go there.

4:06

magnitude spectrum and t, which is the threshold.

The magnitude spectrum, which is the array of values,

of amplitude values, in decibels.

And t is the threshold in decibels that is going to be used to conjure the minimum

value of below which we are not going to consider to have peaks.

So the computation is quite simple.

We check for three conditions in the whole magnitude spectrum.

4:37

Well, the whole,

we are discarding the initial value, value zero and the last value.

So we then between one and the size minus one,

we are looking for the values that are above a threshold, above t.

The values that are higher than the previous and

the next values, so that they are a local maximum.

And then we are considering the peaks as being the values that fulfill

the three criteria.

They are above a threshold, they are above the next value and

they are above the previous value.

5:22

And then we add one in order to undo this one, the condition that we had.

And it returns the locations, an array of peaks of the locations of those peaks.

You can go back to our code, so we call the peak detection.

It returns the locations and then to get the magnitudes of those locations,

we just read those values from the magnitude spectrum.

5:54

To plot that, we have these lines for

plotting in which we define first frequency axis,

so we're going to be able to see the x-axis in hertz.

Then we plot the whole magnitude spectrum, so

we've the frequency axis and the magnitude spectrum.

And we plot the peak locations on top of that.

So we plot the locations and the magnitudes.

And we're going to plot an x on those

locations without any lines, so we just see those locations.

So let's run that, we are in the workspace of SMS tools.

And I have this test.py so I can just run test.

6:46

Okay, and this is the spectrum of the sinusoid.

And of course, it has a main slope of the humming.

So let's zoom into that location.

So let's just go into just the peak.

And this is the peak of the sinusoid.

We can see the cross, and

we can see some of the higher samples of the magnitude spectrum.

7:18

Okay, so in here we can see that the cross is at

location 429 hertz, more or less, or 430 hertz.

That's clearly quite far from the 440 hertz.

Why is that?

Well, this is because our size, the FFT size, was 512 samples.

7:43

And with 512 samples,

we have quite a bit of distance between two consecutive samples.

In order to compute exactly what is that distance,

we can just do the sampling rate 44,100 divided by the FFT size which was 512.

And 86, this is the distance in hertz between

one sample which is here it says 400 and around 28.

And in here is 515 kind of thing, so

this is the 86 samples distance between these two.

8:27

So to improve the resolution, we can now increase the FFT size.

So in here the N, let's make it four times bigger.

For example, let's make it 2,048 samples.

So this should give us better frequency resolution, let's try that.

So we save that, and in here, well, let's remember this plot so

we can now If we run again and

we zoom into the tip of the spectrum.

9:04

Okay.

Okay, yeah, we can see that there is more samples now.

But if we even zoom more,

we can see that now the tip,

the peak has been computed to be around 430 hertz.

And clearly the distance between two consecutive samples has reduced.

So in fact it should be four times smaller.

It would compute that, we can just compute 44,100 divided by 2,048 so

now it's 21 hertz, the distance.

But still, it's not the frequency we would like to have, it's a little bit far.

In fact this 21 hertz resolution is not ideal.

So what we're going to do, is to do this parabolic interpolation.

So in the utility functions file there is this peakInterp function

that performs parabolic interpolation on these locations that we have found.

So from the magnitude spectrum, the phase and the locations that it found,

these local maximum, this function computes the interpolated values.

So from the three highest peaks of

each local maximum, the actual local maximum, left and right values.

It performs the parabolic interpolation to refind the frequency and

find the center of the parabola.

Then it finds the tip of the parabola, the magnitude of it.

And then it uses the location to look into the phase spectrum and

performs a linear interpolation.

Because the phase during this value should be quite flat and it performs

a linear interpolation to find the actual face value, and it returns this string.

11:27

So we will obtain these by calling the peak interpolation function,

so let's copy that too, copy and put it here,

okay, so this is our peak interpolation function.

No need the computer demag.

So now it returns the interpolated location, interpolated magnitude and

interpolated phase.

By calling the peakInterp and

sending it in the magnified phase and location values.

So now we can plot the iploc and ipmag.

12:07

Okay, so let's see if we improve this plot now so that we had the peak

at 430 hertz, see what we can do with now this interpolation.

So let's run again test, it complains about peakInterp because it's

part of the package UF so I have to specify that it comes from there.

12:49

Okay, so here we see the values we had before.

So the tip before it was this tip.

So it was around 430, and

now it has found the cross is in between these two locations.

So it has moved the tip because can change the tip of the parabola and

now the cross, it's closer to 440.

Still it's not at 440, in fact it's around 439 or so.

So we're still one hertz difference but

definitely is much better than what we started with.

So that's basically, I wanted to show, so with this code

we can analyze a spectrum of a sound.

And identify the spectral peak locations that are within that spectrum.

Of course, if the sound is more complex than a sine wave and the threshold

is also lower, it will not just find one single peak, it will find many peaks.

13:59

Okay, let's go back to the slides.

So we have been using the sinusoidal model and

part of it the concept of peak detection.

And using some of the packages of Python, numpy, scipy, matplotlib and

some of the functions available in the SMS tools package.

We have been able to measure the spectral peaks of sound and

with that we are now ready to go to the next stage.

Which we'll be talking about in the next programming class.

Of actually handling more real sounds and

both analyzing and synthesizing these sinusoids.

So I hope to see you next class.

Bye-bye.