0:00

Welcome again to a lecture of the course on Audio Signal Processing for

Music Applications.

Until now, we have been analyzing sounds using the sinusoidal representations.

They work, but there are many types of sounds that best describe

by what we call Stochastic Models.

Signals like the sound of the ocean or

the bold noise of a violin fit into this category of stochastic signals.

We will talk about these today.

0:31

We will first introduce the concept of stochastic signals.

Then how to model them, what is a model of a stochastic signal and

then more specifically on how to deal with sounds using these models.

So how to approximate with from in a stochastic perspective particular sound.

And finally, we will describe the concept of a system that can perform analysis and

synthesis of sounds using these models.

1:00

The stochastic model is complementary to the models that we have covered until now.

In fact, in the following lecture,

we will combine the stochastic model with the sinusoidal-based models.

And we'll be able to take advantage of the best of both types of models.

What is a stochastic signal?

Well, a stochastic signal cannot be described in a deterministic way.

It can only be described probabilistically.

And the feel of statistical signal processing deals with this type of signals

and it's quite advanced topic.

Here we'll get a very broad approach which is sufficient for our needs.

So in a statistical signal processing, we talk about the laws of

probability as a way to describe these stochastic signals.

And we talk about the mean, the variance, and

the probability distribution of particular signals.

2:01

And there's some mathematical functions that are used to analyze

these type of signals and captures some of its characteristics.

For example, one is the autocorrelation function.

We have already seen this function before.

The autocorrelation function allows us to measure the periodicity of a signal or

the degree of repeating patterns in a particular signal.

We use it for detecting the fundamental frequency.

2:35

So this is a function that can be used to measure how stochastically is a signal.

If there are no repetitions,

that means that it's going to be close to a stochastic signal.

So the lower the autocorrelation function value is,

the closer is going to be the signal to stochastic signal, okay?

Another mathematical function that we can use is what is called the power

spectral density.

And also we have seen a similar version of that.

It's basically the DFT but with a major difference.

It's basically the DFT to the limit.

We take the square value of the absolute value of the DFT and

we take N, the size of the DFT to infinity and

if it converges, if it converges to a function,

that's our power spectral density.

3:37

And that happens in quite a few signals.

And there are many models that have been proposed to

deal with this type of stochastic signals.

We'll use a very general model expressed by this equation which is in fact,

the convolution of two signals.

So we'll consider as a stochastic model the idea that

a signal can be expressed as the convolution of white

noise with the filter approximation of our signal.

4:13

So by taking this convulsion, we are assuming that

the signal that we are dealing with is well-expressed or

well-represented by its impulse response.

If we look at the same equation from a spectral point of view,

we can understand a few more things.

So a convulsion in the frequency domain is the product of the two spectrums.

So the product of the white noise, the spectrum of the white

noise with the spectrum of the impulse response of the filter.

And if we express it in polar coordinates,

then we can express it as the product of the two magnitude spectrum and

multiplied by the exponential e to the j and

the sum of the two phase spectrum, okay?

So, that's the product of these two spectrum.

And if we consider that these is stochastic signal,

we basically can say that the magnitude spectrum of

white noise is a flat line and we will see that later.

So it's a constant, so therefore we can take it out of the equation.

So we can reduce the concept of the impulse response of the filter of

the input signal by the magnitude spectrum of the input signal and

approximated version of that, which could be the frequency response of a filter.

It could be some other type of function,

a function that approximates the magnitude spectrum of the input signal.

And as the phase of the model, we use the phase of white

noise because the phase of an stochastic signal is not so relevant.

Therefore, we just can reduce the phase representation of

our signal with random numbers, with the random numbers of the white noise.

6:29

Okay, so this is the good way to express this stochastic model.

So, we take an approximation of the magnitude spectrum of our signal and

we take random phases for the modeling the phase spectrum.

So, this is an example.

So if we start from a fragment of the sound,

for example, of an ocean sound and

let's listen to that, [SOUND] okay?

So we just take just one frame of that.

And then we compute the magnitude unphased spectrum of this ocean sound so

the red plot is the magnitude spectrum of our input signal.

And the phases, the c and

function is the phases of the input signal.

And then the black line on top of the magnetic

spectrum is the approximation of the spectrum.

And we'll talk about different ways to approximate that so,

it's basically a smooth approximation to the magnitude spectrum.

And the black line in the phase spectrum are basically random numbers,

okay, and we claim that these random numbers Are an approximation or

a model of the random numbers that in fact are in the ocean phases.

So we are basically saying that the phase spectrum of the ocean sound are just

random numbers and can be approximated with any random number sequence.

Okay, and then if we take the inverse Fourier transform of these two

black lines of the approximation of the magnitude and

the random phases, we get this output signal.

And we are claiming that perceptually this signal is going

to be similar to the first one.

Of course, by looking at it, that might not seem to be the case because it's

clearly different shape but given that we're talking about stochastic signals,

the details of the shape are not relevant.

What is important is the statistical properties and so

we will be able to try to prove if this type of approximation works.

So the main analysis issue for this stochastic model

is the approximation of the time varying magnitude spectrum of the input signals.

So we'll have to compute these approximation at every frame.

9:29

With LPC, with linear predictive coding,

we can obtain a set of filtered coefficients a sub k, and the frequency

response of the resulting filter approximate spectrum of the input sound.

So, here we see the signal X and the idea is that the approximation

of this signal is defined according to this LPC

model as the linear combination of past samples.

Okay, so it's defined as the sum from k = 1 to K of

a sub k multiplied by x of n minus k which are the previous samples.

This is basically the expression of IR filter,

infinite response filter that is a linear combination of previous samples.

And then the goal of LPC is to find these coefficients,

to find a sub k, the best approximates X

generates a similar signal, X hat.

So we define an error function that is the sum of the square

root of the original signal with this approximated signal.

And we sum originally from minus infinity to infinity,

of course, then we will narrow down to finite length signals.

But with this error measure, basically we can try to identify the a signal,

the a coefficient that minimizes these error signal.

It will not talk about how to actually implement that but

this is a very common approach for obtaining this coefficients and

therefore for doing what we call the LPC approximation.

So if we start from a sound, for example,

of a voice sound like this soprano sound that you can listen to.

>> [SOUND] >> In fact,

these is the type of sound that is commonly approximated with an LPC model.

And what it does is obtains this black line that we see in the bottom plot.

So in the bottom plot,

we see the magnitude spectrum of this fragment of this voice.

And the black line is the magnitude spectrum of the approximation of

this LPC filter that approximates the signal.

And as we can see it kind of approximates what is a very common

characteristics of the voice which is these formants, so, these are resonances.

So, an IR filter is a way to approximate the resonances of a signal quite

12:30

well and so the LPC works quite well for these types of signals.

But the LPC does not work so well for many other types of signals.

And here we present a more simple, a simpler approximation,

that is just based on low-pass filtering.

And we show it by implementing a low-pass filtering using the DFT.

So we start from a signal a[K] and

then we take the DFT of that and we low-pass filter.

Low-pass filter means basically we cut the spectrum and

we only accept the lower part on that spectrum.

And then we can take the inverse DFT of that and we get another

signal which this a-tilde is an approximation,

a smooth approximation of the original a sub k signal.

13:37

Then we might need to extend the signal

in order to generate the same number of samples or

the same sampling rate that the signal that we started with.

So in order to do that, we might have to take the DFT of that,

zero-pad to extend it to a longer FFT size and

then take the inverse DFT of that.

So then b(k) is of the same length than a(k)

because the a tilde is just an approximation, has less samples,

which is good because that means that we have an approximation

with a few number of samples, these coefficients, basically,

these a tilde is just a coefficient of the approximation.

14:38

So now let's talk about the synthesis part of the stochastic model.

If we approximate a sound using LPC or

with any other type of filter design approach, we can synthesize a signal

from the obtained filter coefficients by filtering white noise.

So this equation that we already have seen before is the implementation

of an IR filter in which we're filtering white noise.

We are filtering the signal u with a series of coefficients

a sub k that are the coefficients of the filters.

And the implementation of this equation can be done in different ways.

For example, these two block diagrams are two different

structures that are used to implement this type of filtering.

One, the top is called the direct form structure and

the bottom one is the lattice structure.

15:39

But if you obtain an approximation using the low-pass filtering approach that

we mentioned, we can synthesize the sound directly by computing the inverse DFT.

So in here, we start from our approximation of the spectrum,

of the spectrum of the original signal.

Which is basically these smooth version of the signal is kind

of like what we said the low pass filter approximation of the signal.

And then we can just take a random phases,

the phases of white noise and we take the inverse DFT and

that's basically going to be a filtering operation of white noise.

Okay, so we start from the smooth approximation of the signal,

the random phases and then we take the inverse of T.

And these will be the method that we will use in our examples.

16:42

So now let's put it out together into an analysis

synthesis system using this Stochastic model.

And as of here we see the blog diagram that we will be implementing

which we start from the signal x of n, hopefully an Stochastic signal.

We compute the FFT.

We take the absolute value.

And then we do this Stochastic approximation which is again,

this idea of low pass filtering.

So approximating the magnitude spectrum with a smooth curve.

And then we can do the synthesis.

The synthesis will be done by doing this inverse FFT of

this stochastic approximation that might have to be zero path and

so to interpolated to be a longer size spectrum,

and then we generate random numbers for the phase spectrum.

And we can take the inverse FFT of that and

that will return a fragment of a sound.

And then we can just do an overlap at

the similar in this exact the same way that we did for the sinusoidal modelling.

Here also we will have to take care about some smoothing windows so

that they overlap at works correctly but,

with these we can reconstruct the original signal.

So, let's listen to some example okay, so,

this is the ocean sound that we played before then,

the first is the magnitude spectrum, the absolute value of the spectrum.

Of the spectrogram of this whole sound with a particular window and

50 size and a size.

And then the Stochastic approximation is basically

a visualization of this coefficient that are much fewer.

So in fact here we took a kind of compression of point so,

I've written samples of our magnitude spectrum,

we reduced it to one so, that's the idea of the approximation.

And then, we can synthesize by combining these

magnitude spectrum with random numbers.

So let's listen to

the synthesize result [NOISE].

If you do an AB comparison with the original ocean,

it sounds different but it clearly sounds like an ocean sound.

So, for stochastic symbols, maybe it's not relevant to reproduce

the exact characteristics of the sound but basically this

kind general characteristics of the sound and this is what this approach does.

19:39

So, the fill of statistical signal processing is quite an advance

topic as I mentioned.

And most of the referencing are quite complicated, are quite advanced.

If you start by looking at these Wikipedia pages you can get links and

descriptions to all these more complex views of Stochastic process and

statistical signal processing so feel free to go there and check all these topics.

And that's all.

So we talked in this lecture about Stochastic Model.

The goal was to introduce a strategy with which to model some sounds or

parts of sounds that cannot be well represented with sinosoids.

In the next lecture, we will see how we can combine these stochastic models

with the other models we have been discussing, the sinusoidal base models.

So I see you in the next lecture, bye bye.