So why stochastic signal processing? So far in the class, we have assumed that every discrete time signal we used could be described exhaustedly, either via a close form representation like so or, and this is really the advantage of discrete time signals, simply by enumerating its non-zero values. How ever interesting signals are not known in advance. For instance what im going to say next contains information that is not known in advance. But I do know what im going to say next is going to be a speech signal. So I can try to describe stochastic signals in terms of a probabilistic model. The good news is that we can do signal processing with stochastic signals using the same tools that we have developed so far. In this module, we will not try to treat the subject of stochastic signal processing, either exhaustively or very rigorously, but we will try to give you enough intuition, and the mathematical tools, to deal with ubiquitous random signals, such as noise. So let's start with a simple example of a stochastic signal. Suppose we generate a discrete time signal, by tossing a coin for each sample, and setting the sample's value to plus one if the outcome is head, and minus one if the outcome is tail. So because of the mechanics of the coin toss, each sample is independent from all others and each sample has a 50% probability of being plus one and a 50% probability of being minus one. So this our signal generator and every time we turn on the generator, every time we repeat this experiment of coin tossing. We get what we call a different realization of the signal. We can plot it and so for instance, the first run of coin toss gives us this series of plus ones and minus ones, and then if we repeat the experiment again, we toss the coin another 32 times, we get a different realization. And we can repeat the experiment as many times as we want and the outcome will likely be different every time we run the machine. Although we cannot describe in advance the values of the signal, we know the mechanics behind the generation of the signal. And the question is, can we analyze the signal? So, for instance, can we get a special representation of a random signal. So let's try by taking the DFT of a finite set of random samples. If we do that, we just run the machine for 32 samples and then we take a DFT. But every time we repeat the experiment, we get a different plot and the values don't really seem to follow any pattern. So maybe we tell ourselves, maybe we don't have enough data to discover a pattern in the DFD. So maybe we should take longer realizations. Instead of 32 points, maybe we should take more. And so let's try and take 64 points. And still we don't see any pattern. So let's try and take 128 points. And again, the spectrum doesn't seem to show any trait that we can readily understand. So we need a new strategy. When faced with random data an intuitive response is to take averages. So, we average out the values in order to eliminate the fluctuations. In probability theory, the average is computed across the realization, not along the time max's, but across different repetitions of the experiment. So, for the coin toss signal, the expectation of each sample is -1[P(n-th toss is tail)] + 1[P(n-th toss is head)]. But we know that these probabilities are one half each, so this sum ends up being zero. Since the DFT is the leader operator, averaging the DFT values will not work either. If we try and do that, we quickly realize that the average of each DFT sample will be 0 as well. However, the signal does move from minus 1 to plus 1, so its energy, or its power, must be non-zero. If you remember the definition of energy and power of a signal from module 2.1, we can readily see that the energy is infinite. Because the limit for N that goes to infinity of the sum from -N to n of the values of the sequence squared. A simple 2N plus one, because each value squared will be equal to one. And so as N goes to infinity, this will diverge. However, the signal does have finite power over any interval. We take the energy over a minus Into an interval, and we normalize that by the length of the interval. And we find out that the power is actually 1, regardless of the interval's length. Let's try the following strategy. Let's try to average the DFT's square magnitude, normalized. So we pick an interval N, we pick a number of iteration, so a number of times we will repeat the experiment. We run the signal generator n times, and we obtain n, m points realizations. We compute the DFT of each realization, and we average their square magnitude divided by the length of the interval. So if we do that, of course, the first DFT will be this random pattern that we have seen before, but as we increase the number of realizations, we seem that the points seem to convert to something. And indeed by the time M hits 5000, we see that the average of the squared magnitude of the DFT seems to converge to the constant 1. So have defined a quantity here P of k, which is the expected value of the squared magnitude of the kth bin of the DFT over N points divided by capital N. And it looks very much as if P of k, this expectation, is equal to 1 for all ks. So if the square magnitude of the DFT tends to the energy distribution in frequency of a signal, then the normalized square magnitude of the DFT tends to the power distribution, or the power density in frequency. So what we have just derived is a new frequency representation for signals that have infinite energy, the finite power, and it is called the power spectral density. Let's try to develop some intuition about the power spectral density of the coin toss signal. The fact that it is constant means that the power is equally distributed over all frequencies. In other words we cannot predict if the signal will move slowly or super-fast. We cannot predict that because each sample is independent of each other. So we could actually have a realization where, just by luck, we have a constant signal because all coin tosses are heads or we could have a realization in which at each coin toss we have a different outcome. And so the signal will oscillate at the maximum digital frequency. The power spectral Embodies this behavior by distributing the probability of power over the entire frequency access. Lets try now to filter a random process and to run this experiment we take our coin toss signal once again and we take what is probably the simplest filter we can think of just a two point moving average filter. So the output of the filter will be the average of two neighboring points in the input random signal. And the question we want to find an answer to is what is the power spectral density of the output. So we compute this numerically. We run the experiment as before. We generate a random signal. This time we'll filter it and then we take the average over several realizations of the DFT squared of the filtered output. So we'll run the experiment like before. We choose an interval n, we generate the random signal. This time we filter it and then we take the average over several realizations of the DFT squared. For n equal to 1 we don't really see a pattern. But as M increases, we see that the power spectral density converges to a very precise shape. Now if you remember the frequency response of the two point moving average, that is just H of e to the j omega, equal to 1 plus e to the j omega divided by two. And this shape here It's nothing but the square magnitude of this frequency response evaluated at the DFT points, ie at 2 pi over M. So, indeed if we plot this, we see that the match is almost perfect. The power spectral density of the output seems to be the power spectral density of the input, IE, the constant one, multiplied by the square magnitude of the frequency response of the filter, this time computed on the DFT grid. We can generalize these results beyond numerical experiments and move it into the world of infinite support stochastic processes. The details are in the book but here we will summarize the key points. A Stochastic process is characterized in frequency by it's power spectral density and it can be shown that the power spectral density is the DTFT of the autocorrelation of the process. Where each sample of the autocorrelation is obtained by taking the expectation of the product of the stochastic signal times a delayed copy of itself. For a filtered stochastic process general result is that power spectral density of the output. Is equal to the power spectral density of the input times the frequency response squared. The good news is that this result guarantees that the filters that we design in the deterministic case can still be used with stochastic signals. A low pass will remain a low pass and a high pass will remain a high pass. We do however lose the concept of phase and this is understandable since we don't really have any advance information on the shape of the stochastic signal. That will depend on the particular realization. All we know is where the power is distributed in the frequency. Now that we have some stochastic tools in place, let's attack the concept of noise. So noise is everywhere. It appears in the form of thermal noise in circuits. It could be the sum of various extraneous interferences in communication systems. Or the quantization and numerical errors that complex digital systems produce and that we cannot predict in advance. Because of our lack of knowledge about the sources of the noise. We will model the noise as a stochastic signal and the most important type of noise is white noise. With the term white we indicate a stochastic process where all the samples are uncorrelated. If the samples are uncorrelated the article relation of the process will be zero every where except at zero. Where it will take the value of the variant. As a consequence the power spectral density is the constant sigma square. Where is sigma is the variance of the stochastic signal. Graphically the power spectral density of a white signal couldn't be any simpler. We have seen an example of this in the coin toss experiment. The power spectral density of white noise is independent of the probability distribution function for the single sample. Distribution however will be important to estimate the bounds of the noise signal in the time domain. Very often we use a Gaussian distribution to model the underlying probability distribution function for the sample. The reason for that is that the Gaussian distribution is the model of choice when we want to represent the effect of many unknown superimposed sources, as is the case for noise. In this case is the noise where we called additive white Gaussian noise, or AWGN for short.