0:52

So, let's go directly to the interface that

we have in the semestral package, through which we can access all the models.

Which is this models GY Interface.

And well, here we have the Harmonic model as one of the options but

let's start from the DFT and let's start by analyzing a simple sound,

a sound of which we know the fundamental frequency and

that is very clear so this is a subtle sound.

If we can listen to this sound.

[SOUND] Okay, so this is an electronically generated sound, and

now what we want to do is to just first look at the single DFT.

So that then we can understand better the sound and

decide what are the appropriate parameters for analyzing the harmonics of the sound.

So the first decision we have to make is what window do we use.

Being a simple sound,

electronic sound, sincerely the type of window is not that critical.

So let's start with the simple window.

For example, let's start with the humming window.

2:20

By default here, it 511, but how do we decide the best window size?

And we went over that in theory class.

So the window size which also we call with the variable m,

can be computed by multiplying the width of the main lobe of the humming

window which is 4 multiply by the sampling rate

of the sound which is 44,100 and divide it by the fundamental frequency that we have.

And in this case, it's at 440 Hertz,

which is the A for a note.

So, we divide by 440.

And, the result is basically 401 samples.

This would be four periods of this particular sound so let's do that.

Let's put 401 samples as the window size, and in terms of the FFT.

3:26

Size, well, we wanted to have bigger than the window size.

Here, we can just do a big FFT size so

that we have a lot zero padding, we have a smooth spectrum.

So let's put for example 2,048, and

we have to choose where are we going to perform these analysis.

This is a one second sound, so here, .2,

that sounds like a good point in which to choose these 401 samples.

So let's compute.

Okay, so this is the analysis results.

And the input sound, as we chose,

is four periods of the.

4:58

Okay so these peaks corresponds to the harmonics and we have harmonics going from

440 hertz up to half of the sampling rate.

So in fact, if you look at the shape of the saw tooth,

is not that perfect saw tooth,

in the sense that is doesn't have the smooth saw tooth.

It has this kind of oscillation years.

And this is because we have a finite number of sinuses.

It is not the, an infinite is not the continuous waveform is a discrete waveform

and it has a limited number of sinusoids.

Okay, and then if we do the inverse of that,

we obtain the reconstructed waveform but

of course is a reconcited waveform with the window that we apply to eat.

So, we applied humming window.

So, this is a window saw tooth waveform.

Anyways, so this works quite well and we can just analyze

6:35

decide what is going to be a peak or let's say, a partial of this sound.

So the magnitude threshold,

we can put here we see the thresholder's things going pretty much down.

So we can just put for example -100.

Then we can decide minimum duration of the sinusoid,

that this being an electronic, a very stable sound,

this really doesn't matter that much number of sinusoids to try.

Well, we can just put a big number.

We can just put for example a hundred that to define and

then we also can have a deviation that we allow from

one frame to the next in terms of Hertz with respect

to what would be the frequency 0 then is a little bit scale as It goes up.

This being a very stable electronic sound.

It's really not an issue that the stability is going to be so

high that this frequency deviation could give back very small and

therefore the slope of these deviations.

So the change of these deviations as the frequency goes higher also it can be very

small.

So that's not an issue.

So let's compute with these values.

7:54

Okay, and this is the result.

So here, we have the regional sound, the complete sound.

Now, we are analyzing all the sound.

And here is the harmonics or the,

basically the sinusoids that it found and the reconstruction.

So, here it's very much,

in fact the harmonics of the south except at the very bottom.

If I could look here at the very bottom, we see these lines

that in fact they are not part of the harmonic series.

And why is that is that it's impact its side note.

It will go back to the DFT here at the very low frequencies, we see some type

loads and this is what is catching the sinusoid model at the very bottom.

9:25

Okay and okay, interestingly enough,

we see a very different set of sinusoidal tracks.

We see many more.

What are we seeing here?

Well, we are seeing a lot of the side loads of

every single harmonic because the side loads are quite high.

And therefore with a threshold of minus 100 decibels.

And with a window size which is large enough so

therefore, there is space to visualize the side lopes.

This appear in the analysis.

Even though, if we plate, [SOUND] it sounds pretty good.

It sounds as if we are only resin to sizing, the harmonics and

this is because, of course, they are part of the spectrum.

So in terms of reconstruction, it's pretty good,

even though from an analysis point of view, it's not so

good because we are seeing, basically, the artifacts of the analysis.

Okay, now let's go to the harmonic model and let's in fact,

start from kind of this wrong parameters.

So parameters that are not the best.

So we start from the honey window and we do this 600 window

10:52

size and we have this -100 threshold.

And in the harmonic model, an additional set of parameters that we

have to specify relate to the actual fundamental frequency and

the number of harmonics to be detected.

So in terms of number of harmonics, in fact,

we can know because that given that the fundamental frequency is 440.

We can compute in fact the maximum number of harmonics that will be in the spectrum,

which will be half of the sampling rate,

22050 divided by the fundamental frequency, by 440.

So, 50 is in fact the maximum

number of harmonics that will be present in this sound.

So, we can specify 50 harmonics, and then we have to specify a possible

range of the fundamental frequency, so to help the two way miss match algorithm

that is being used here In the detection of the fundamental frequencies.

So, for example, we can be kind of flexible.

So we can put between a 100 and let's say 600.

We know that this is 440, so we could be more restrictive.

But this would be just fine.

And let just compute it with these parameters.

12:35

in the sense that we are now restricting

the sinusoids to be harmonics of the fundamental that was found.

And even though, the window size was large, the number of peaks

I then define were many more and we also identified the side notes.

These has now constrained the search for the harmonics.

And therefore, we only see the harmonics.

Of course now, we can go back to the ideal type of analysis barriers.

So, we can go back for humming window, and let's go

13:15

back to 400 samples and let's leave the rest the same.

And we compute, and now we in fact we will obtain the same thing.

It's the same thing but now the window size is small which

is sufficient, and if we play the original [SOUND] and

if we play the reconstructed [SOUND] is identical.

So we basically have captured all the relevant

information of these sinusoidal components of the sound.

Okay, now let's go to more real sound and more natural sound.

So let's close all these play windows.

And let's start again from the DFT, but let's look at a violin sound.

So there is a violin sound here in the sounds of the SMS tools.

Which is a violin with frequency B3, and we can listen to that.

[SOUND] Okay, so B3, the pitch that corresponds to the note B3,

which is lower than the A4 that we had before,

is 200 and around 46 hertz, okay?

So, in order to find the best window size,

we can compute the, for

example if we start from a humming window,

we need to compute 4 x 444,100,

divided by the frequency so 246.

Okay, so this is a lower frequency.

Therefore four periods of the sound is each larger 717 samples.

So we can put here 717, okay.

When we can be the same f50 size, and

in fact, well we can this sound a little bit longer.

So, let's put 0.5 as place to be analyzed.

And here is the result, this is not an electronic sound so

the number of periods now that we have chosen still four,

but is much more irregular than with a subtle.

So in fact here, it's even a little bit harder to see the period.

In fact, the period is like two bumps.

So this would be one period would go from here to here and another.

Then another, so it's again, four periods of the violin sound.

The spectrum is a little more complex than the one again of the subtle,

but we see clearly the harmonics.

So if we zoom a little bit better into the part

that we see as being relevant we see the first

few peaks, and these are clearly the harmonics of the sound.

But we see a lot of kind of energy or

spectral information that doesn't have this nice-looking

sort of peaks or shapes corresponding to the window.

So in fact Instead of a hamming window, it might be better to take

a smoother window that kind of can discriminate better these

kind of background residual or noise, or these sounds.

So let's use the Blackman window.

And having this, is being smoother, we need more samples.

So in fact we need six periods for this,

the main lope of the Blackman is six beam wide.

So we use the same equation to complete the window size but

multiplied by six, so now we need at least 1075 samples.

So let's put here 1075 samples, and let's compute the same way.

And now, we're seeing much better

the harmonics of the sound.

In fact, let's compare it with the previous one and

having zoom Into the same area so we zoom

19:06

And let's apply the same values.

So, let's apply the Blackman window,

and the, if we remember the size that

we put was 1,075, so, 1,075.

And the FFT is 2048 which,

it's okay, it's a good zero-padding.

The truth is that we don't need that much threshold down.

We need more than in the because as we can see here

this higher harmonics are quite a bit down from the first harmonics.

So minus 100 would be here.

So let's just leave the minus 100.

And then in terms of the duration of the tracks.

So this is this idea that if a crack doesn't last enough

we're going to reject it.

In terms of number of harmonics also we can compute what would be

the possible maximum number harmonics present in the violin and

these we will just take the half of the sampling rate,

22050 and divided by the fundamental frequency that a B3 has, which is 246.

So, maximum if there were harmonics all the way through half of

the sampling rate, there would be 89 partials 89 harmonics.

Well, lets put 89.

And then the fundamental frequency will have to have the algorithm again from

we know it's 246 so we can just put between 200 and 300.

That should be enough.

And these other parameter again this is a very simple sound.

It's not as simple as the sound of the steel.

So I do not think these parameters will matter that much but

so let's just compute it like this.

21:39

But let's listen again the input and output.

[SOUND] So this is the input that we started from and

this is [SOUND] the output.

Sounds pretty good.

If you listen carefully, there may be some aspect,

especially during the attack that is not quite there.

So kind of this is a let's say, cleaner version of the original sound.

Or let's say, a smoother version and maybe is not as bright as the original sound.

But clearly the color qualities of the sound are here.

And we have been able to capture them.