0:00

Welcome again to the course in signal processing for music applications.

In the last lecture we introduced the harmonic model and

mentioned that in order for it to work we need to be able to detect

the fundamental frequency of sound, what we call the pitch of a sound.

We will use the terms fundamental frequency and pitch interchangeably but

strictly speaking, they're referred to different concepts.

F0 is a signal processing concept, pitch is a perceptual concept.

For our course, F0 is a more appropriate term to use.

0:35

Many methods have been proposed to identify the fundamental frequency of

a sound, and these methods can be grouped into the ones that work

directly in the time domain signal and

the ones that work on the frequency domain representation on the spectrum.

The time domain approaches work well on monophonic signals and

the frequency domain approaches can be made to work on monophonic signals but

also on polyphonic signal,

and that's going to be a very important advantage of these type of approaches.

1:17

This is a fragment of an oboe sound.

We can listen.

[SOUND] The time signal clearly shows a periodicity and

we can identify a period, a cycle that keeps repeating.

And this length, the period,

it's inverse is what we actually call the fundamental frequency.

In the frequency domain, in the spectrum, we also see a periodicity and basically

the distance between two consecutive peaks is the fundamental frequency.

So we can also think of algorithms that could measure that.

So we could measure it in the time domain or in the frequency domain and

maybe not too hard.

The phase might be useful in certain situations, but

let's not talk much about that now.

A single note of a piano has a clear pitch.

Thus, it should be able to detect it in its F0.

Let's listen to a piano phrase.

[SOUND] Clearly listen to pitch of the sound.

But if we look in the time domain, well, it's not that trivial.

It doesn't seem to be easy to identify the period of this sound.

In the spectrum it's a little bit easier.

We see some of these peaks, they clearly have a periodicity.

So we can envision some algorithms and we can take advantage of that.

2:50

And then if we deal with polyphonic symbols, for

example this is a fragment of this kinetic piece, let's listen to that again.

[MUSIC]

There are several sound sources but the voice is the most prominent one and

to detect the fundamental frequency of the voice in the time domain is basically

close to impossible.

In the spectrum, well, it's not easy either but we'll see that there are some

algorithms that attempt to identify this prominent voice,

this harmonic component in the frequency domain, and they do a pretty decent job.

All right, so to detect the fundamental frequency in the time domain,

we basically have to identify the length of its repeating periodic cycle.

And the autocorrelation function is a mathematical tool for

finding repeating patterns.

It is the cross correlation of a signal with itself, and informally, we could say

that it's the similarity between samples as a function of a time lag between them.

So in this equation, we see a version of the autocorrelation function that has some

tapering, and what we do is we compute this function for every lag time.

So we try different lag times, where it is an integer values,

a sample value so we start with l equals zero.

And then we sum overall for a particular period of time,

a fragment of a sound multiplied by the sum delayed by that lag time.

Of course if we delay by zero it's the same signal and

if we delay with different lags, the multiplication will be different.

So we will get a function of l, so

therefore we'll be measuring how correlated is a fragment of a sound with

the samples delayed by a certain l.

So let's look at a particular example.

So this is the oboe sound again, and below is the autocorrelation function in

which we clearly see, of course at zero, lag zero is one.

It's completely correlated and

then as the lag increases, and here we have expressed lag time in seconds instead

of samples to make it easier to correspond to the top signal.

And clearly we see that a lag corresponding to one period,

which is this .002, there is a local maximum.

5:37

And clearly it is the biggest local maxima, so that would be a good indication

that this is the period or the inverse of that would be the fundamental frequency.

And then the lack of two periods is also a local maxima, smaller,

and since there is a tapering also this keeps decaying with lag time.

But let's say that the autocorrelation function for such a clear periodic sound

is quite a good measure of the period or therefor of the fundamental frequency.

6:16

For the case of the piano sound, the time domain wave form is not so well behaved.

So for this fragment of a piano sound where we hear, in fact,

the pitch quite clearly, if we plot the autocorrelation function for

these different lag times it's not that clear.

There are several peaks.

Well, the highest peak is in fact the fundamental frequency, but

it's very difficult to have a threshold that would make it a clear decision on

which is the best peak to identify the fundamental frequency.

7:05

And it's based on the difference equation,

which is equation similar to autocorrelation.

We just take the difference between samples with a given lag,

and then take the square and then sum.

And this function is 0 when the lag

is equal to the cycle length.

So we have to find the minima of the function.

And the YIN algorithm, that's some extra processing here

to get a good measure of this period.

And it does a pretty good job for monophonic signals, so

in fact it has become a very common algorithm for speech, or for

measuring the fundamental frequency on monophonic musical instruments.

Let's look at how it does for a particular sound.

So in this, this is the spectrogram of the vignesh sound that we have heard.

And here we have plot the function,

the black line is the fundamental frequency that the algorithm has detected.

Of course it has detected it on the time domain, not on this spectrogram.

So let's listen to the fundamental frequency [SOUND].

So this is pretty good,

is basically tracks the fundamental frequency very well.

But this type of method does not work for many sounds, especially does not work for

polyphonic signals.

So we have to go to the frequency domain.

8:46

We have seen how to identify the Sinusoids and the partials of a sound.

For example, on the oboe sound these crosses are the peaks.

And many of them correspond to partials or harmonics of this sound.

But which of these peaks or

maybe some other part of the spectrum, which is the fundamental frequency.

How can we identify the partials that are harmonic.

And then maybe from this information, we can identify which of them or

which other frequency is the fundamental frequency of these partials.

9:30

So the F0, the fundamental frequency in the spectrum of a sound can be defined as

the common divisor of the harmonic series that best explains the spectral peaks.

And this is a very nice and

compact definition that in fact we can develop algorithms for, for developing it.

So here we see a plot of that oval sound and the peaks and

the vertical green lines correspond to one harmonic series.

In fact they correspond to the harmonic series that best explains these

spectral peaks.

So just by visual inspection,

we see clearly that the green lines, which are all multiples of the first green line,

10:24

they are definitely not right on top.

And there are some peaks that are not taken into account.

The F0 detection problem in the frequencies of main can be formulated

as a pattern matching problem in which we have

to find the pattern of the harmonic series that best fits the spectrum.

And the Two-way mismatch algorithm,

proposed by Maher and Beauchamp, that's exactly that.

11:04

We see the major peaks on the very far right.

These are the peaks that we have obtained, the frequency of the peaks.

And then we want to check a given predictive,

a given harmonic series a given F0 on its multiples.

How close it is to this measure peaks so how well it explains its measure peaks.

So what we going to do is measure the distance between this pair of values.

Okay, so we will be measuring the distance between the predicted to the measure, and

also from the measure to predicted.

That's why the term,

Two-way mismatch, because in fact, this distance will not be the same.

So this first equation is predicted to measure.

So we take every predicted peak or every predicted value, and

we look at the closest measured peak and find the frequency distance.

And then we scale it according to the amplitude.

We also have a value that sort of

promotes the lowest frequencies compare with the higher frequency.

So we have some waiting coefficients here that allow us to tune

this equation to the kinds of sounds we want to work with.

We are not going to go into detail,

but feel free to look at the article or this equation to understand it better.

And then we do the other way around, we measure the measure to predicted error.

So we start by looking at all the measure picks and

look at the closest Ideal peaks.

Or the ideal harmonics.

And again we look at the distance, and we apply some weighting factors.

And then we have a total error.

Which is the sum of these two errors.

Again we have some weighting coefficients.

So that we can set it to work for our particular situation.

Maher and Beauchamp proposed some values for

these coefficients and variables, and these are the ones that we'll be using.

13:19

So let's put an example to explain this algorithm a bit better.

So for example, let's consider series of peaks that we have measured.

In particular, let's consider that we have measured a peak of 200 hertz,

300, 500, 600, 700, and 800.

And let's check for different fundamental frequencies.

Let's check for harmonic series on top of 50, another on top of 100,

another on top of 200.

And so, in these metrics we see the different errors, predicted to measured,

and measured to predicted, for

these different candidate fundamental frequencies.

And clearly the best result is for 100 hertz.

At 100 hertz is the harmonic series that best explains the peaks here,

even though the frequency 100 is in fact not there, and

that's a very interesting consequence of this algorithm.

The fundamental frequency doesn't have to be a peak, a measure of peak for

the algorithm to give a value at that particular point.

14:25

And let's put an example of these error functions for a particular sound.

On this sound that we already have been looking at.

The oboe sound.

So here we have on the bottom, we have the three error functions.

The blue is the predicted measure, the green is the measure to predicted and

the black is the total one.

Four possible fundamental frequencies, ranging from zero to 1500 hertz.

So basically, we have swept all these frequency range and

have tried the algorithm for all possible frequencies at increments of 1 hertz.

15:08

Okay, and clearly, we see that there is one point that there is a minimum.

There is a local minimum, and of course it's at 440 hertz,

which is the fundamental frequency of this oboe sound.

This is an easy case and so here we see that there is no max doubt for

the algorithm that the fundamental frequency is 440 hertz.

But let's look at a sound that is more complicated.

The piano sound that we mentioned before.

And this is the result of the best fundamental

frequency identify by the two way Two-way mismatch of these frames.

So the black line is the minima of that error function at

different nodes as they vary in time.

And let's listen to the first piano sound.

[SOUND] And then let's listen to this

fundamental frequency as a sinusoid.

[SOUND] Well there is some glitches,

especially in some areas where we see some gaps, but

it does a pretty decent job in following this fundamental frequency.

In polyphonic signals, this is not so easy, it's much harder.

16:31

So polyphonic signal,

it can have many sound sources, both harmonic and inharmonic components.

And the idea in F0 detection in polyphonic signals is to identify the fundamental

frequencies of all the harmonic instruments that are playing together.

That's to find all the harmonic series that are present at every frame.

So for example, in this plot, we are showing the harmonic

component that the signal that we talked about before, that we can listen to.

[MUSIC]

And we are plotting, according to some algorithm,

possible harmonic series present.

So there is harmonics summation formula that allows us to measure

the strength of different harmonics series in a similar way the two-way mismatch.

And these are the best, sort of the loudest, harmonic series.

17:33

Or at least the candidate harmonic series, so

these are possible fundamental frequencies of those harmonic series.

Clearly I don't think this is completely right,

but it's a first estimation of that.

Salamon and Gómez, they presented an algorithm that on this type of

harmonic summation contours is able to identify

which is the lead instrument or the lead voice and therefore,

18:09

the prominent harmonic instrument in this case that is singing the voice.

And it does a pretty good job like in this sound example,

I will play the prominent pitch it has found for this sound.

[SOUND] So that's pretty good.

Again, there might be some glitches, especially I see one glitch,

but it's able to identify the prominent pitch over this whole sound.

18:43

So the best references for the algorithms that I mentioned in the class

are the original articles in which they were proposed.

So I would encourage you to, for the YIN algorithm,

look at the article by Cheveigne and Kawahara.

For the two-way mismatch, read the article by Maher and

Beauchamp and for the melody algorithm by Salomon and Gomez,

we've got this IEEE transactions article, which describes it quite in detail.

Again, you can find other information, and there is a lot of algorithms that

have been proposed to do fundamental frequency and pitch detection.

So I encourage you to study more into this and

get a grasp of the techniques that are behind these ideas.

19:32

In this lecture,

we have presented different approaches to the fundamental frequency detection.

And this is a research problem that has not been completely solved yet,

especially for complex signals.

In order to not make things too complicated,

we will focus on monophonic signals, but the concepts that we will explain from

now on should also be applicable to any type of signal.

By combining the harmonic model that we presented in the previous lecture

with the F0 detection that we just presented, we can analyze and

synthesize harmonic signals.

But things are not finished yet, so see you in the next lecture,

where we will take these even further and try to see what happens when we have of

course sounds that the harmonic model does not work so well.