0:00

Welcome again to the course on audio signal processing for music applications.

This week we're talking about applications.

We're talking about how to use the models we have been studying

throughout the course for the application of sound transformation.

So we aim at manipulating sounds and changing the different aspects of it.

0:25

In the first demonstration class we exemplified the idea of morphing

using the short time transfer.

In the last class, we talked about time scaling,

how to change the duration of the sound using the sinusoidal model.

And in this class, I want to talk about pitch changes,

how to change the frequencies of a sound.

And we will use the harmonic plus stochastic model.

So we'll be basically change pitch related information of harmonic sounds.

0:58

In order to use the harmonic model we need to understand a little bit this sound.

So for example we will start with this saxophone sound.

Let's listen to this

[MUSIC].

Okay, in order to define, especially the window size,

we need to know the ranges of fundamental frequencies that are present here.

So a good way to do that is to look at the spectrogram of the sound.

And basically zoom in to

the first harmonic so that we see basically the fundamental

frequency which is the first line of this harmonic series.

And kind of see which is the highest and lowest values in here.

2:16

and then the highest is this node here which is around 600 and something hertz.

Okay, so this is good information for

now defining the parameters of the harmonic plus stochastic model.

So let's go to the SMS tools model GUI and

let's go directly to the harmonic plus stochastic model.

2:59

And now in order to decide the window size,

well it's good to basically go to terminal and

from a Python we can just quickly do calculations.

So for example we can just say, okay the blackman window has

a six advancing the main node, we multiply by 44,100.

And we said that the lowest frequency was around 400 something hertz,

so in order to be safe, let's say okay, 400 hertz.

So 400, and this is the window

size that is appropriate for a frequency of 400, the lowest which is

the meaningful one because it's the longest window that we will need.

Okay so we will put as window size let's say 661 our size.

FFT size let's make a big one so we have zero padding let's so 2048.

The threshold it really doesn't need to be that low, but let's leave it.

So we have a lot of harmonics there.

The minimum duration of sinusoidal tracks 41 that's fine.

The maximum number of harmonics.

The maximum number of harmonics that there will be will be 44,100 divided by 400,

okay that would be,

if it had all the harmonics it's 110 but of course this is the lowest frequency and

this is really if we would have harmonics all the way through.

So 100 would be fine, then we need to define the range of the fundamental

frequency so we can put the one we set.

It was around 400 and the other was around 600 and something,

so to be safe, let's say 650.

This is the nearest threshold to identify the fundamental frequency.

Maybe let's be a bit more flexible and put seven.

And this deviation, that's fine like this,

and the stochastic approximation for the residual, we

5:18

Okay, so this is the result, we have the original signal,

the analyzed, the harmonics plus the stochastics and the synthesized.

Let's listen to the different components of it, the sinusoidal component.

[MUSIC]

It clearly captures most of the sound.

Then let's listen to this stochastic.

[SOUND] Well, it's very soft, but it's there so it's a relevent component.

And of course, the sound of the tool.

[MUSIC]

Okay, so this is a good starting point to now run the transformation.

So let's go to, let's quit this.

6:19

Okay, so this is the GY for the transformations.

And let's go directly to the HPS model with the transformations.

And well, it's already by default the sax phases here.

So let's use the parameters that we use.

If I remember, it was 661, we did FFT of 2048.

The threshold was minus 100, minimum sine

duration was that, number of harmonics 100,

these minimum frequency was at 400 and maximum was 650.

F0 detection, the F0 error threshold was seven and

the stochastic factor we put 0.4.

Okay now we can analyze And

this we'll definitely do the same thing that we did before.

So we can check that the analysis is correct.

[MUSIC]

And that's exactly the same sound that we heard before.

So now we can start playing around with the transformations.

And we have two Possibilities for

changing the frequencies and one for changing the time.

So for the time, we're not interested in changing the time so

let's say the time as 0, 0, 1, 1.

So that means that it's not changing anything.

Okay, now in frequency scaling, we have two frequency transformations

7:52

given that we are in a harmonic sound.

We know where the harmonics are, and

that's a great advantage compared with the sinusoidal models.

In fact, these type of changes could be done with the sinusoidal model but

of course then, we are restricted to some transformations.

And for example the frequency stretching is not possible with the sinusoidal

model because we don't know which sign should correspond to the which harmonic.

Okay let just first maybe let's just use the scaling first so

let's have here again without any transformation.

So if you put 0111 that means that there is a frequency stretching of one so

it means no where stretching at the beginning and at the end.

And then in the frequency scaling let's start with by downloading or

sort of decreasing the pitch of this sound.

For example, 0.8 and so at time zero we will have 0.8 and

at time one we'll have also 0.8, okay?

And a very important parameter is this temper preservation.

This temper preservation what it does is it

tries to preserve the shape of the spectrum of the harmonics.

If we put one, it preserves the harmonic shape.

So it should sound more natural than if we put zero, in which zero would just

transpose everything and so the magnitudes will be affected.

So let's apply like this.

9:36

So, let's listen to the result.

[MUSIC]

So it sounds quite natural even though we have transpose.

Mainly because of this timbre preservation,

we have maintain quite a bit this quality of the saxophone.

9:54

And then just to finish, let's make some frequency stretching.

So frequency stretching is kind of to convert a sound into

an enharmonic type of spectrum in which we are adding

an exponential factor to the harmonic value, let's say.

So we have at time 01, let's say, let's start with one.

And then at the end, let's stretch everything to let's say, 1.1.

Okay, so we will have a stretching factor then, and not at the beginning so

progressively the stretching will increase.

So let's see what that does.

10:52

keep getting apart from each other more and more as the time goes on.

And clearly at the end they are not equally spaced,

so that's a, enharmonic spectrum.

Let's listen to that.

[MUSIC].

Okay, so clearly the low frequency is the same but

as time progresses the sound sounds more enharmonic,

kind of more metallic because the harmonics have been stretched.

Of course we can do a lot of things.

So feel free to play around with these parameters and

of course with time scaling.

Time scaling is also very powerful once we have been able to analyze the sound

with the harmonic plus a stochastic model.

11:46

the idea of changing the pitch or the frequencies of a sound.

First, we use SonicVisualiser to understand the sound.

And then we use the SMS tools UI with the harmonic plus the stochastic model

to change the pitch or the frequencies of a sound.

12:03

And so we have been talking about pitch change.

Of course, pitch change can be done with the sinusoidal model, can be done with

the harmonic plus stochastic, or the sinusoidal plus stochastic.

Or with quite a few of the models we have been talking about.

And in Audacity also there is some implementations for that.

So anyway, so we just presented a little bit of that, an example using the harmonic

plus the stochastic and the potential for this type of transformations.

So I hope you got an idea of that and now we'll have still another demonstration

class and we'll be talking about the harmonic plus stochastic model.

But in another type of possibility of transforming sounds which will

be of morphing to sounds, interpolating the two representations of tools sounds.

So I hope to see you all in next class.

Bye-bye.