0:00

Welcome to the course on audio signal processing for musical applications.

This week, we are talking about the harmonic model.

And in this programming lecture, I want to talk about the implementation of it.

In particular, the first part of it.

The part that requires detecting the fundamental frequency so

that we can then identify the harmonics of a given sound.

So we will be talking about one particular algorithm, the two way mismatch algorithm,

that's an algorithm that we presented in the theory lecture and

it's a frequency the main algorithm that basically tries to identify

harmonic series, possible harmonic series that match the peak of spectrum.

So in this plot we see the measured peaks that we have identified.

And then we keep trying different predicted

fundamental frequencies and the harmonics of it.

And we measure the error,

we measure the distance between these two lists of values.

And we did that by measuring two errors, the predicted two measures.

So the difference between, the distance between the predicted and

the measured values.

And also we have another measure which is the measure to predict that error.

But let's go directly to the actual code.

1:25

Okay, in the sms tools package, in the util functions file,

there is the code for a way we match algorithm.

The core of it is a function called two way mismatch In fact,

there is a C version and a Python version.

Now we will go through the Python version.

When we run it, we normally use the C version because it's more efficient.

So this algorithm, what it does,

is it receives the peaks, the frequencies, and magnitudes of the peaks.

It receives a list of candidates,

of frequencies of candidates of fundamental frequency.

And it basically identifies which is the candidates that has the smallest error.

2:15

So it does that by measuring the two errors, the error predicted to measure

just this part and then measure to predict it, which is this part.

Within it it keeps identifying

all the distances between all the balance of the harmonic series and the peaks.

And it has different ways of comparing those.

We're not going to go into the tail of that, but of course,

feel free to go into it.

And then finally, it just creates an error array,

which is the list of errors of all the candidates, okay?

So we have, in the array, we have all the errors for every single candidate.

And then what we do is we choose the minimum of those errors, and

the fundamental frequency is going to be that candidate that has the minimum error.

3:19

And then this function is wrapped by another one

that is the one responsible for generating the candidates and calling the function.

So this Fz, F02wn, receives

again the peaks of the spectrum, and then it receives the control parameters,

so like the maximum error allowed, this is the error that will be allowed for

the fundamental frequency to be accepted as such, and then the range

of the fundamental frequencies are from minimum and maximum at 0.

And then there is one value which is kind of a memory, a tracking value,

that is basically the fundamental frequency of the previous frame.

And this will allow us to refine the fundamental frequency

by restricting that the fundamental frequency should be as smooth as possible.

But the algorithm here is very simple.

In fact, it just takes the list of peaks that are within

the minimum and maximum value of the frequencies.

And the rest, it just makes a few more comparisons about that.

There's a lot of room for

improvement in this algorithm in the sense of generating more candidates so

that we do a more exhaustive trial of different frequencies,

but for efficiency reasons we made this simple implementation so

that allows us to compute this quite efficiently.

4:56

Okay, so I wrote a little script that basically does

an analysis of a single spectrum and then it computes the errors of all candidates.

Okay, so in here, I have this little script that from a sound,

the subtle sound, it just computes one DFT.

And so here is, it computes one DFT of that particular sound.

It finds the peaks.

It finds the, interpolates the peaks and

then it generates possible candidates of the fundamental frequency

in a similar way than what we just saw but even more simply

in the sense that we are taking the candidates as all the peaks that lie

within the ranges that we specified.

And then it called the two way mismatch algorithm but

I modify the functions what I have it here in a way that,

instead of returning just one value.

The fundamental frequency that is the minimum error, it returns all the errors.

So it returns the array of all the errors for all the candidates,

so that we can look at them and see how they behave.

Okay, and then it prints, here it just prints the spectrum and the pixel.

You can understand a little bit what's going on.

Okay, so let's run this.

So let's run test.

6:36

Okay, and this is the spectrum.

The magnitude spectrum and the peaks that we found.

So let's maybe zoom in a little bit.

So then we can see a little bit what is going on, okay.

So these are clearly the harmonics of the sound.

It has also found peaks like before the fundamental frequency and

one after the fundamental frequency.

But that's so that's what basically in

terms of the identified just the harmonics.

So now let's plot or let's print some of the intermediate values of all these.

So clearly the first thing is the candidates.

So if we print the f0c.

7:26

This is the candidate and is going to be a pics that lie with in

the frequency of range we specified which was between 50 and 2000 hertz.

So the candidates are the first five pics and

if we print their frequencies so by doing ipfreq.

And f0 candidates, those are the frequencies

that lie within the frequencies range, and that we're going to test in the algorithm.

So we're going to test 166 hertz, 440, 637, etc, etc.

8:05

So now let's print the errors that it returns.

So F0Errors, which is

the output of this algorithm will have the errors for every one of these values.

So for 166, we have an error 4.8, 440 has minor 0.13,

so clearly this is the smallest of all errors and

this is indeed the fundamental frequency.

The candidate that is the best one for, as a fundamental frequency.

So these error values are really misleading because they are not

bounded within a particular range.

It can even be negative, like in this case.

But clearly the larger the error, the less

9:07

Okay, so this works quite well.

Now we can go into another

file that basically does this for the whole sound.

So we will be iterating for the whole sound.

We just doing the exact same thing.

We're taking the sotto sound and we are trying a different window.

We keep doing it and see if we can get a different type of result.

We take the FFT, we find the minimum and maximum and

we call a function F0 detection.

Which in fact is on the harmonic model file.

9:50

In the harmonic model file there is this function

called F0 detection that does all what we talked about.

Basically accepting gifs from the input sound.

The sampling rate window FFT size and the values given by the user,

it iterates over the whole sound and it calls the DFT, the peak detection,

peak interpolation, the two-way mismatch algorithm and

then it decides which one repeat the base, the fundamental frequency.

That is if you sort of constraints to make sure that the fundamental

is stable in time, it related with this track that we talk about.

So but basically it returns just the fundamental frequency that's

considered to be the best, okay.

So let's now look at test1 and let's run it.

So let's run test1, okay and

now in fact we can just show the f0.

Okay so this is the values that it has returned,

while the hop size that is specified was quite large, 1,000 so

it is not that many samples, so that's easier to look at.

And okay, clearly there is not a perfect

fundamental frequency identify.

It kind of varies.

It goes from 439.9 to 440 something.

So in fact, if we plot this array,

we will see the variation that we will have here.

11:45

let's get rid of these and now lets plot it again.

Okay, now we can zoom in to the very top, okay.

So clearly it moves around 440 and

these variations are caused by clearly by the errors

of the peak detection algorithm and the interpolation so

that we are not really exactly at 440 but,

of course, the error is very small.

It's less than 1 Hz error.

So, this is 440, and 440.5 and a little bit below.

So this is clearly a quite small deviation

from the nominal value that is 440.

If we change these values, we might get better results.

For example, instead of having the window being 1,001,

let's make it twice as much.

12:53

And a 50 size, let's make it twice as much.

Okay, and, no, times 2, okay.

And now we will, so let's see that before we were 439.95, 440 something,

let's see if it does any different,

any different by looking now at these values, okay?

So this is what we got now and it's a little bit better.

So we can see the difference between these two values.

Now the error is smaller than before.

In fact, if we plot now this have zero.

Well there is the exponential to the minus,

to the minus 2, so clearly this is a very small,

a smaller error range than what we had before,

so the lowest now is 439.9 and

the highest value is 440.042.

So that means that as the window gets larger and

the FFT gets larger and the increases we will have better values.

Okay, now let's look at the real sound and let's finish

by running it on this over sound and basically do exactly the same thing.

So I can just run this test2, okay,

this will compute a fundamental frequency of this other sound.

And now if I plot this at 0,

14:44

Okay and now we will have to zoom into the meaningful range and

well there is definitely also a variation, but

here there is both the variation that may be caused by errors.

And the variation that is clearly natural to the plane of the over sound.

So for example, this sound is clearly higher then 440,

so the over sound was played a little bit higher in

frequency than 440, so around 442.

And there is kind of a periodic oscillation that make sense

to be present in the sound and of course it might be sound of

this oscillation and maybe caused by some error but

this is a very interesting way to try to understand what is going on.

And both in terms of algorithm and in terms of the sound in terms of this other

sound and natural oscillations that may be caused by either the acoustics or

the performer that is playing this note.

15:59

Okay, and that's all I wanted to say.

So basically, we have talked about the implementation

of the two-way mismatch algorithm and I think that has given us

a view on the issues of how to detect it from the mental frequency.

Of course, we have used Python and a number of these packages and

the implementations that we have in the SNS tools package.

16:27

So that's all.

So this was the first programming class on this harmonic model week and

then on the next lecture we will then add the whole model.

And include these fundamental frequencies into a harmonic analysis,

and we'll be able to do both analysis and synthesis of sounds.

So thank you very much and I'll see you next lecture.