Next, we're going to look at another application that illustrates the utility of being able to define our own functions, and that's to write programs to work with the Gaussian distribution. You're familiar with the Gaussian distribution. It's a mathematical model, the so-called bell curve that has been used successfully for centuries to fit experimental observations in lots of contexts. This is the function, it's called the Gaussian probability density function. It's Phi of X equals one over square root of two Pi e to the minus X squared over two, and this is a plot of the function. It's centered around zero, that's the mean and the standard deviation measures how it spreads and that's one. So, this is the scores of SATs in some year. It pretty well matches this experimental data if we adjust the parameters. We take a mean of 1,019 and standard deviation from the experimental data 209, then plot the Gaussian function with those parameters, two parameter Gaussian is one over sigma square root of two Pi e to the minus X over Mu squared, x minus Mu squared over two sigma squared. Then it scales the function to match your experimental data. The mean and the sample standard deviation, it fits experimental observations. So often, when writing programs that process data we want to work with the Gaussian distribution, the Gaussian probability density function. There's good math behind it, but for now, all you need is that, that's the formula for Phi of X with a little math to get the two argument version that's scaled, subtract the mean divide by sigma, apply Phi and then divide by Phi. So, that's all the math that you need, just definition and a little algebra. So, Gaussian distribution arises everywhere and these are just some examples from papers pulled off the web. If you're studying polystyrene particles in glycerol or laser beam propagation or optical tweezers or oxidase patches in the visual cortex, Gaussian distribution arises. I don't know if I believe this one, predicted US oil production, that doesn't sound too Gaussian to me, but people try to fit it anyway. German money, well, Gauss was German, and actually, have the curve on their money. So, we want to work with it. So, is it there in Java's math library? Do I have math.Phi? The answer is no. So, why not? It's so important, why is not there? Well, maybe the answer is that you could do it yourself. So, you can create a class Gaussian and you can define a function, we'll call it pdf, probability density function. All we have to do is implement that math formula, one over square root of two Pi e to the minus X squared over two. So, that's Math.exp minus X squared over two divided by math.square root, two times math. PI, it's as simple as that. So again, we're calling a function another module, we've been doing that for awhile. Then if you want the three argument version, then we use that one appropriately. Now in Java, if you have a two function signatures with differing number of arguments, they can have the same name and that's useful in this case. But all we do is use the other one and just apply it with the argument x minus Mu over sigma and then divide the result by sigma. So, and we'll look at some more functions in a minute, but I think the point is that if there's any math functions that you need, you can define libraries yourself that implement those functions. That's an example of modular programming, you don't want to put this math code in every program that does the Gaussian, you just want the program to act as if the function is implemented. So, we put this in a module Gaussian.java and now you can call these functions with Gaussian.pdf and so forth. In a way with libraries, any user can extend the Java system to include the functions that aren't appropriate for the application at hand. It's a very powerful idea. Now, there's another Gaussian function called the cumulative distribution function. So that one is, if you have a particular point on the x-axis, you might ask what percentage of the total is less than or equal to that? That's the same as what's the area of the curve to the left of a certain point? That's called the Gaussian cumulative distribution function and that's useful in applications. This is what it looks like. It's the one over square root of two Pi integral from minus infinity to e minus x squared over 2 dx. That's useful to know, it has a similar, but simpler three argument version where you just have to evaluate it at x minus Mu over sigma. So, that's the math for the cumulative distribution function and it's typical to apply. So in our year, you had to get an 820 to be a division I athlete. You might say, well what fraction of test takers didn't qualify? Well, the answer is given by the Gaussian CDF for that data with that average and that standard deviation. In this case, about 17 percent. So maybe, you want to write code that uses the Gaussian cumulative distribution function to be able to compute information like this. So, what do we do with that? Well, there isn't a close form for that, it was expressed in terms of an integral, how are we going to implement that? The answer is, well, there is a Taylor series that is very accurate, and actually, this is the way that most math functions are implemented on computers. In this case, there's math that shows that Phi of z is one-half, the CDF is one-half, the pdf times this series which the terms get smaller and smaller pretty quickly. So, we can just write code that implements this math formula. So if it's eight, if it's less than minus eight is so small is to be negligible, and if it's bigger than eight it's so close to one as the difference is negligible. So, we start with compute this sum and this just implements this formula where we compute each term by multiplying by z squared and then dividing by the next odd number and that's what we do as long as we're getting anywhere. If we add something and it's still the same then we can stop. Then what we're supposed to return is a half plus what we computed times the pdf. It turns out that this result is accurate to 15 places. Then again, if you want to do the three argument one, it's just the other one minus the mean divided by sigma. So, a pretty small amount of code and we can implement that. Now, from a point of view of a program that's using this, it doesn't know how we implemented, it just wants to compute with the answer, and so, we're able to protect that client from having to deal with all these details. It's a nice way to organize the computation to keep this knowledge in this one place. The bottom line is you have 1,000 years of math formulas to work with, they're not all going to be implemented for you, but when you need to use them, you can encapsulate them yourself so that you can write programs that take use of them in a coherent way. So, that's our summary. Here's a library for Gaussian distribution functions. We've got the pdf both forms and the CDF both forms. The best practice is to include in every library a main function that at least tests all the code, it might as well have it do something useful, so in this case, it takes as argument the mean, the standard deviation from that you're interested in and then the cutoff point and it prints out the Phi value, the cumulative distribution value for that. So, anybody asks you what fraction, and all you need to know is the three numbers and you can get the percentage. So, we actually use this code to evaluate randomness and submitted programs when they're supposed to have randomness. But anyway, really, the bottom line is that for any future program that you have, if you've got a number of related functions like say of this nature, you can build them into a module and then use them. That's the really important concept about libraries. So, how do we actually use the library? Well, you've been doing that, but libraries that you write yourself, if they're in the same directory and you're working directory has your other program, then you just have a copy of it there. There's also your operating system, you can set it up so that you can work with only one copy and you can find details on the both site. Then if it's there, then you can just use those functions from any other module in that directory. We were doing that already for apply that note in other examples. So, all I really want is just draw a plot of it using standard drawer. So, this is just code, just a regular main function that I want to write and I'm going to set up my drawing and just go through and use that pdf function. I'm going to sample it according to an argument taken from the command line, and then just compute the x and y points using CDF and I can make a plot. So, that's samples 200 of the Gaussian pdf. That's just an example, and we'll see another example later on. So, very easy to use a library that you've created. It's really, again, an easy way for you and any user to extend the Java system.