0:00

Hello, welcome to the course on Signal Processing for Music Applications.

This week we introduce the concept of stochastic signals and

a way to model these type of signals using spectral approximation.

So in this demonstration class, I want to actually use an implementation of these

model of the SMS tool set package and see a how it works.

0:43

very noisy, so that would be quite good for this type of approximation.

In terms of parameters, the default are okay, hamming window,

window size 124, hop size 512, let's first listen to the sound.

[SOUND] And now let's compute the stft,

okay, so here we see the spectrum

of this ocean sound and clearly

the kinds of things we expected.

In the sense that the magnum spectrum is very granular and

there is very little kind of repetitive structure.

The only kind of structure we can see is this overall shape

1:44

being shown here as this increase of this red area.

And also we see that the higher frequencies are softer than

the lower frequencies so there is much more emphasis on the low frequencies.

And in the phase spectrum we basically see a random numbers that is nothing

particular here that we can see in terms of a given structure.

And if we zoom into the phase spectrum,

definitely it will corroborate this idea that these are basically random numbers.

If we do it on the magnitude spectrogram, maybe one of the areas

that are more sort of stable, and it's also very noisy,

of course in here there is some more overall trends that we can see.

2:38

But now let's see what we can do with a sinusoidal model

that we have been talking about and using.

So let's take the sine model and let's get the ocean sound,

okay, in here we definitely will meet a lot of sinusoids in

terms of the window size, I don't think it matters too much but

maybe let's take a smaller window let's say 1,000.

Here the real scenes the phases do not matter that we

can really just say 1,024 and FFT 1,024,

there is no need for doing anything different from that.

The hop size will be one fourth of that so that will be fine,

magnitude threshold minus 80 well the duration of the sinusoids here

clearly we need to account for a tiny sinusoids that will come in and out.

And in terms of the number of sinusoids, well we need a lot, so lets put 200 hertz.

4:00

Okay, so this is what we get, again maybe what we were expecting,

the sine waves these are the frequencies, they are all over.

We just see that they are scattered all over again

in a very kind of granular random way.

Let's listen to the sound that they synthesized from the sinusoidal model.

[SOUND] It doesn't sound too good,

it has these kind of tonal quality and

we hear pitches that were not really in the regional sound.

This is why the sinusoidal model may not be that appropriate for

this type of sound.

You can push it more and make it sound closer to the ocean sound, but

clearly it's not an appropriate modeling approach for this sound.

So now let's go to the stochastic model and let's open the ocean sound, okay.

And in here there is not that may parameters to choose,

one is the hop size and the FFT size will basically

be twice as that, so there is no need to control that.

5:17

And then there is an important parameter which is this smoothing approximation

factor which is basically how much smoothing we're going to perform.

For example we can start by zero point one that means that we're going to reduce

the size of the FFT by 90%, we're going to have the result is 10% of the overall.

So that means that we're going to have only one every ten beans or

frequency samples in the frequency domain.

And of course, the phase spectrum will be random numbers, so

let's listen while it's compute that.

5:56

Okay, that's quite fast and here, of course, what we are seeing and

maybe we can compare it with the original stft..

We have magnitude spectrum which is much coarse,

so there is much fewer horizontal blinds because we have

6:17

the FFT size that was down sample so basically it was smoothed out.

Okay, but let's listen to the output sound.

[SOUND] Well it doesn't sound like the original,

let's listen again to the original.

[SOUND] But it definitely sounds like some water,

so the quality is very much the same It sounds

with that kind of high pass kind of thing.

So let's try to get a little better,

let's have a hop size smaller than this one, maybe 64.

Because these time changes are important, and maybe let's not

reduce this that much, like point five and let's see what happens.

Okay, now we have more information we have a finer grain both in the horizontal and

vertical axis so let's hear what is the result.

[SOUND] Yeah, these sounds a little bit better and

we can play around with these parameters

to get different types of approximations.

Now to finish this let's try how this approximation works

with the sound that is not really completely stochastic.

For example let's open, the speech sound, let's open, this speech male sounds, okay.

And let's not do so much maybe let's do 256 and let's

do maybe point two as an approximation, let's first listen to the sound.

>> Do you hear me?

They don't lie at all.

>> Okay, so now we're going to attempt to approximate these we're going to get

rid of the phase spectrum, we're going to make it random numbers.

And we're going to smooth out the magnitude spectrogram, so

let's see how it sounds.

8:26

Okay, so this is the approximation, so this was the original sound.

This is the stochastic approximation, we see a very coarse type of approximation

to the magnitude spectrum and this is the resynthesize sound,

let's listen to that >> Do you hear me?

They don't lie at all.

>> Okay, that's very interesting, in fact, it sounds like a whisper type of sound,

we have lost all the pitch information, because a lot of this pitch

information is in the phase spectrum, and we have basically got rid of that.

And since the magnitude spectrum is quite smooth, also we got rid of quite a bit of

possible pitch information that was present in the magnitude spectrum.

9:09

Okay, so that's all, that's what I wanted to show.

So, we have looked at one implementation

of this stochastic approximation that we have talked about within the sms-tools.

We have this code that approximates the sound using this model

and well we have used some sounds from Free Sound.

So hopefully that has given you a flavor of what does it mean to

approximate a sound with the stochastic modeling approach.

And of course we're going to be using that for the residual of some signals.

And so in the next demonstration class, we're going to put it together with

the other models as we have been talking about.

So we'll see you next class, bye bye.