0:00

Hello, welcome back to the course on Audio Signal Processing for Music Applications.

This week that we're talking about sound transformations,

in the demonstration lectures, we have been exemplifying the different

models that we have been talking about during the course.

And some transformations that can be done using those models.

For example, on the first class,

we talked about the short-time Fourier transform as it can be used for morphing.

On the second one, we talked about time scaling using the sinusoidal model.

And then on the last one,

we talked about how to do pitch changes using the harmonic plus stochastic model.

Now I want to go back to the idea of morphing, but using a different model.

1:05

So in order to do the morphing, we first have to have a good analysis

of each of the sounds that we want to morph.

So let's start with the GUI, the SMS tools model of GUI.

And let's go to the harmonic plus stochastic.

So let's start with one of the sounds we're going to morph.

We're going to morph the violin sound with soprano sound.

So let's start with the violin sound, okay?

This is the sound.

[SOUND] So we have to choose quite a few parameters.

The blackman window is a good choice for this stable note and

the side lobes are quite low, so that's good.

Have to choose the window size and that's always requires some computing.

So B3 is around 246 hertz.

So in order to decide the window size,

we just take the number of beams of the window

which is 6 times the sampling rate 44100

divided by the frequency of this note.

So we need around 1,075 samples.

So let's put it here, 1075.

The window, the FFT size has to be larger, so let's make it quite large.

So we get all our observed padding, 4096 for example.

2:50

In terms of the maximum number of harmonics, again,

we want quite a few, as many as we can.

So, 100 is also okay.

Now in terms of the range of the fundamental frequency,

we said that it was 246 fundamental so definitely this has to be below that.

If we do from 200 to 300, that should be okay.

And the f zero detection error, well, 7 should be fine.

The deviation, yeah, 0.1, it's quite a bit and it's fine.

And the stochastic approximation, yeah, let's not do too much of an approximation.

So we get the good quality of the residual, so let's 0.8, the maximum

would be 1 which would be whole magnitude of the spectrum, 0.8 is okay?

So let's compute it.

Now let's listen to the sinusoids.

[SOUND] Okay, this is fine,

the stochastic.

[NOISE] Okay, that's quite noisy but it's soft, it's okay.

I think we can manage that, okay?

And here we can see this representation.

We could try other parameters but let's leave it like that.

And now let's analyze the other sound.

Let's analyze the soprano sound, okay?

And this is an E4.

So now again, let's listen to that.

[SOUND] And let's give the blackman, the window doesn't have to be that large.

So let's choose the window size.

It's 6 times 44100 and

an E4 is around 330 Hertz, kind of.

So it's 330.0.

Okay, so doesn't need to be that large, the window, so let's I'll leave it as 801.

And FFT size, well, let's leave it at peak so that's good.

Magnitude threshold, -100, duration 0.5 is fine.

Let's keep the number of harmonics.

It has to be the same number of harmonics because we're going to be interpolating

the two of them.

So 100 is fine and now since the frequency was 330,

let's now, the voice has a vibrato so it will change quite a bit so

let's be safe and let's put from 250, for example, to 400.

Okay, and now we can just leave the same parameters,

the same error threshold for the f zero detection.

The same deviation and the stochastic factor of 0.8.

So let's compute that.

Okay, it's a little more difficult to analyze this sound because of the form,

there is some areas of the voice and there is not much.

5:46

But let's listen to the result.

[SOUND] The sinusoids look good.

[NOISE] The stochastic sounds, okay, good.

And of course, the sound is fine.

Okay, now we are ready to go and

to do the actual morph between these two representations.

So let's close this and let's go to

the transformation directory and

let's type python transformations_GUI.

This is the interface for the transformations, so now we can go directly

to the HPS morph option, and in fact, the sounds

that we are going to morph are already the default ones so we will use those.

And now let's change the parameters to the ones we decided to use.

So we decided to use the size of the window for

violin 1075, and the FFT size, a big one, 4096.

The threshold -100.

The minimum duration of a trajectory, we decided 0.5.

And given that these frequencies around 246,

we decided to use from 200 to 300.

And now the error threshold we set 7.

And here maybe we can be a little bit more open and just say 0.05, okay?

And for sound two, the soprano.

We are going to use the same window, blackman.

We are going to use a smaller window because it's a higher pitch,

so 801 is fine.

And we decided to use the same FFT size.

And similar values for the rest.

So for the minimum, this is a higher frequency, so we need to,

it's 300 and something, so we need, yeah,

250 to, there is no need for 100, 400 should be enough and

this 7 and this, okay?

So we can now analyze.

8:28

Okay, so these are the two sounds.

Clearly on the violin, it found more harmonics than on the voice.

So that means that we are only going to be able to interpolate the harmonics

of the voice.

So now what the transformation will do,

it will be interpolating this as two sets of values.

And we have three ways to interpolate the set of values.

We can interpolate the frequencies of the harmonics.

We can interpolate the magnitude of the harmonics.

Or, we can interpolate the stochastic component.

So for example, let's just have the frequencies of sound 0 of the first sound.

So let's put that at time 0 we'll have the sound, the first sound,

which we'll refer as 0.

And at n, we also have the first sound.

So basically the frequencies are of the violin.

And the magnitudes, let's say, are of the voice.

So we'll put that at time 0, we'll have 1 which are the magnitudes of the voice.

And at 1, we'll also have that.

And for the stochastic, well, we can just put 5815 so

we can just put a time 0.5 in between and

at times 1, we will put 0.5.

Okay, let's see what happens.

Okay, so this is a result.

And the frequencies clearly look the spacing of the violin, but

the, let's see the magnitude,

we don't see them here because we don't see the magnitude of the lines.

But let's listen to that.

[NOISE] Yeah, so clearly it sounds what it is,

it sounds a little bit the magnitude of the voice,

but at the pitch of the violin.

Now, let's go from one to another.

So if we go from all the values of the violin

to all the values of the voice,

we can just do it by putting 0011, and

again here 0011, and here 0011, okay?

And let's apply it.

Okay, and here, clearly we see that it's going from one sound, and

here from the frequency we see that there is this kind of which is

because the pitch of the voice is higher than the pitch of the violin.

So let's listen that.

[NOISE] Okay, so clearly we see this evolution.

And of course, in these, we have an envelope that we

can specify any interpolation and in any time varying fashion.

So we could have quite sophisticated interpolation envelopes.

Clearly, this is very different from the short time for a transform that we did.

So okay, let's finish this.

And basically we have talked about a transformation,

the morphing, using the harmonic plus stochastic model.

That's within the SMS tools.

And clearly it's a different type of morphing.

It has different possibilities than the SDFT.

We can now interpolate basically every set of parameters.

And obtain any sound in between.

So even though we are using the same term, morphing, the model has a big

impact on the possibilities that the technique offers and

what we can do with this idea of interpolating between two sounds.