0:01

Welcome back to the course on audio signal processing for music applications.

This week we are talking about the harmonic model.

And in these demonstration classes,

we are trying to understand this model by actually using it.

By analyzing some sounds and synthesizing them.

In this lecture, I want to go a little bit beyond what we did and

analyze a fragment of a sound.

And see if we can take it to the limit and

see what is it's potential and it's limitations.

So, in particular, we will be analyzing a few notes of a cello that I played.

And the cello is, of course, it's a great instrument.

It's a very traditional instrument that you can do a lot of things with it.

So a good way to get a grasp of the types of sound that the cellist does is look

at freesound and just search for violin cello.

Okay, and that will give you a few samples of different types of cello sounds.

So in fact, for example, the first one is kind of an extended technique.

1:09

So let's listen to that.

[NOISE] Okay, that's what is called a seagull effect.

It's kind of an interesting sound.

Of course you can also get some more traditional notes playing what

is called tenuto.

[SOUND] Or also with the cello you can play pizzicato notes,

and this is pizzicato note.

[SOUND] But, of course, you can find many types of sounds and little fragments.

The sound that you will be using is model a short

cello-phrase from a very traditional catalan song.

The song of the birds.

And in fact, it's the one that I use in the teaser of the course.

Let's listen to that one.

[MUSIC]

Okay, so let's analyze this sound.

And let's open the SMS Tools GUI, and

we'll first start from the short-time Fourier transform.

This is a time-varying sound,

so we need the short-time Fourier transform to get a grasp of it.

2:37

Okay, and okay, we have to choose the parameters.

And okay, instead of a humming window, let's choose the blackman window.

The blackman window after the main lope, is wider than the humming,

but the slide lopes are lower, and that may be good for this sound.

So, let's choose a blackman window.

And then, okay.

Window size, we don't have much to decide from,

but let's just, for example, let's use 1001 examples.

Just leave the 50 size at 1024, and the 50 size, the hub size,

has to be at least one-fourth of the window size, so let's use 250.

Now we compute that.

This is a longer fragment so it's going to take a little longer.

3:28

And from it we will be able to visualize the magnitude spectrum and

the phase spectrum.

Okay. Now what we are interested in is

in deciding what parameters

are needed in order to be able to distinguish the harmonics of this phrase.

So, in fact, an important thing

is to identify what is the lowest fundamental that is being played.

The lowest fundamental will be the one that will determine

the minimum distance between two harmonics.

So let's zoom in to the very bottom of this spectrogram.

Okay.

4:09

Okay, so this is the first and a little bit of the second harmonics.

So it's a very clear harmonic sound.

And we can see that it starts a little bit low, goes up, and then it goes down.

Clearly the lowest frequencies are going to be the first and the last.

But here we see that the resolution is not so good.

In fact, we see these boxes, kind of this quantization in the horizontal axis.

The vertical axis is pretty good.

We have 250 samples so there is quite a lot of frames.

But there is not that many in terms of frequency samples in order to be able to

visualize and then, further on, analyze the peak of this harmonic.

So let's increase, maybe the window size cannot be increased,

because then we would lose the time per solution, but

let's increase the FFT size, so we get a smoother spectrum of samples.

For example let's say 4096, so quite a bit more of the FFT size.

So this will give us quite a bit of zero padding and

therefore will give us many more samples

in the frequency lane even though the actual data point will be the same.

So again, this takes a little bit to compute.

5:34

Ok, so this is the spectrum.

Clearly there is a more fine resolution than before.

Let's do the same thing.

Let's zoom into the very bottom of the spectrum.

Okay.

And okay and yeah.

Now, we definitely have many more frequencies samples and of course it looks

similar but we now would be able to see the center of the window much better.

So if we look like in the last node but with the center of this,

now looking at the y-axis, it's around 348 hertz, okay?

So that would be the lowest note.

And the highest note is around 456 hertz, okay?

So this is good information for deciding, now, the window size that we should use.

So in fact, okay let's do the sinusoidal model, and let's look at this information.

So in order to decide what is the period lens,

we have to take 44,100 and

divide it by that frequency which was around 340 hZ.

Okay, so make it a little bit lower.

So this says 129 samples, and then if we use, for example,

the blackman window, well, we will need six times that.

So, if we take six times this.

7:18

That should be enough to discriminate the harmonics.

And the FFT size, I think it was good to have these big FFT size so

4,096 was a good choice.

because it gave us a good resolution, at least visually.

And now, of course, in the sinusoidal model, we can choose threshold,

the minimum duration of the sinusoids and how many sinusoids we want to track.

The maximum frequency deviation here we should, I think,

have it a little bit bigger because there is quite a bit of variation.

Now let's, of course, choose the cello sound,

the cello-phrase, and we'll compute it.

So this is the sinusoidal analysis.

8:26

Okay, we see well, definitely the harmonics,

but also we some lines in between them.

And we see some, trajectories stopping and continuing.

Anyway, so this sinusoid model.

If we listen to the result.

[MUSIC]

Well, it's pretty good.

Maybe we are losing a little bit of the attacks of the notes, but

it's pretty good.

So let's go directly now to the harmonic model, and

let's use the same cello-phrase.

9:08

And let's use the Blackman window.

Let's use the same 700.

We should have an odd size window.

So let's use 779 in terms of the FFT size again.

Okay I think it was a good choice, 4096, a lot of zero padding.

The magnitude threshold minus 90, that's okay.

The duration of the tracks.

Okay, so these will require to be 0.1 seconds.

I think we can even make it bigger.

So let's say 0.2 seconds.

And the maximum number of harmonics, there is clearly no need for 100.

In fact, a way to check how many harmonics are needed is the,

if we divide 44,100 by the lowest frequency.

Okay no. We have to divide half of the so

22,050 divided by the lowest frequency.

So 64.

That would be the maximum number of harmonics that we would have if we really

would have all the harmonics in the lowest note.

So no need for 100.

Let's say 60 would be plenty.

And here, now is where we have to choose a range that includes all this melody.

So we said that the lowest frequency was around 340,

let's make it safer, so let's make it 300.

And the highest was above 450 or something, so

let's make it quite a bit higher.

Just in case let's make it 500.

Okay?

10:54

And this is an error threshhold that will be

quite relevant now for identifying the fundamental frequency, but

let's just leave it as it is now, and see if we have to change it later.

Okay. So, now we'll compute it.

So again, this will take a little bit of time.

11:13

Okay, so this is the harmonics it has obtained and that looks pretty good.

It found quite a bit of harmonics, of course in the transitions,

that's where the problems, or at least the little deviations occurs.

If we just zoom into just one transition, let's say.

So this is where the harmonics of course get lost and they are picked up again.

If we listen, well, let's plot it again to the regional.

And if we listen to the synthesize.

[MUSIC]

Okay, that's pretty good.

Now in terms of this error threshold of the algorithm.

If we make it more restrictive, so

that means that unless it's below a certain error it will not be accepted.

We might see, then, that some of these areas, it does not find the fundamentals.

So instead of seven, let's put, for example, two, and let's see what it finds.

12:26

Okay, so now we see the result and

we clearly see that in the transitions there are gaps and

this is because, in the transitions, the fundamental is not very clear.

We are in a kind of attack, noisy attack.

So it has lost a little bit of the transitions, and if we listen to that,

in fact, we're going to listen to these gaps.

[MUSIC]

Okay, so there are gaps in the transitions because that's where the areas that it

didn't find the fundamental and therefore it didn't find any harmonics.

And that's basically all I wanted to say.

So let's go back to the slides and well,

we have used the SMS tools GUI in order

to analyze this cello fragment.

And we have used the short time Fourier transform, the sinusoidal model,

and now the harmony model, to see this phrase and to analyze the harmonics.

And we can see that by tweaking the parameters, we can get quite

a bit of difference in the way that these harmonics are analyzed.

So, that's all and this is all for the demo classes of this harmonic model week.

So hopefully this has given you a view of

how the harmonic model can actually be used in practice.

And still, it's not ideal.

There is some parts of the sound,

especially like in this sound that we just heard in the attacks,

that we lose a little bit of the sound that is present there.

So the next week, we will extend the idea of the sinusoidal and

harmonic model to include that aspect,

to include what we will call the residual or the stochastic component.

Hopefully, that will allow us to generalize our models and

to be able to handle many more types of sound.

So I will see you in the next class.

Thank you very much.