To do machine learning, we're going to use TensorFlow. TensorFlow is a machine learning library that underlies many of Google's products. We open sourced this in 2015 and TensorFlow is actually a C++ engine. The reason it's C++ is so that we can use GPUs, we can use CPUs we can run on android phones et cetera. But, people don't want to write code in C++, so you have an API and that API is in Python. The Python API talks to C++ gets the job done. TensorFlow is essentially a numerical processing library but it has a variety of features that make it particularly good for Deep Neural Networks and training of Deep Neural Networks. So, first of all, so because we're going to talking about Neural Networks, what exactly is a Neural Network? Let's go ahead and look at this pretty cool site called playground. So, I'm going to playground at Tensoflow.org. Lets go ahead and remove these so that we have an idea of what is it that we want to do. What it is that we want to do is that we have some data, and the data is that we have blue dots and we have orange dots. The idea is that given a dot, we want to be able to predict whether it's orange or is it blue. In order to do that we have two pieces of information, we have the X and we have the Y. The X here is from minus 66 and the Y here is from minus 66. Given the X and Y we want to be able to predict if a dot at this point for example X is five and Y is four is that going to be blue or orange? What do you think? I think it would say orange because everything far away seems to be orange. But, at this point, the background image is a prediction. The prediction is that it's going to be blue, everything to the right of this is going to be blue, everything to the left of this line is going to be orange. The way this prediction comes about is by taking the two X's this X and this X adding them together and that's basically what my result is going to be. So it's basically going to be a sum of the X and Y with a certain weight. So, minus 0.18 times X minus 0.28 times Y, add the two things up, if it's less than zero it's orange, if it's greater than zero it's blue. Because these two weights are negative you can see that the negative data is here and all the positive data is here. So that's basically a prediction that's pretty bad right? The prediction is, that everything here is going to be blue, everything there is going to be orange, that's not true. But let's see if we can change these weights, there's a weight here to on X, there's a weight here on Y. Let's say, go ahead and tune these weights to come up with a better prediction. As you can see, it is not possible to come up with a better prediction that can linearly combine X and Y, to basically separate blue dots and orange dots. So let's stop this, this is going nowhere and let's think back about this problem. Remember when I said when X is five and Y is four, what was the color? You can intuitively said it would be orange and the reason you thought it would be orange was because of the distance. Everything that is close to the center was blue, everything far away from the center was orange. Going back to elementary school, what was the formula for a distance? It's square root of X squared plus Y squared. So there's an X squared term and there's a Y squared term. Let's instead of just using X and Y, let's add X squared and let's add Y squared. So now we have four inputs, not just X and Y but also X squared and Y squared. So given these four inputs, let's come up with weights for all four of these in such a way that it separates blue dots and yellow dots. So let's start and low and behold that's my prediction now. The prediction is that everything inside of this is going to be blue and everything outside of that is going to be orange. That seems to capture the data pretty well, it captures our intuition of what this data say very well. So this idea is called feature engineering. So, one of the ways that we can improve our machine learning models predictions is to get human insight into the problem. The insight that we had was that this was based on distance. We knew that distance involved X squared and Y squared, so we threw that into the network and we said train yourself with weights. But let's say we don't have that insight. All I have is X and Y and I want to basically do this prediction. So, rather than do feature engineering another thing that we can do, is that we can create a Neural Network. So, I'll create a layer of these and what this is doing is that this guy is X and Y added together and to that I'm applying some function. I could do Rectified Linear Unit, tan hyperbolic, Sigmoid, whatever. It doesn't really matter which one we choose, let's just pick TanH. So I'll do TanH there, TanH here, TanH here, TanH here, TanH here. So five different TanH's, add them all up. Why did I pick five? Who knows I just picked five. Why do pick TanH? Who knows I just picked TanH. I just picked something, I have a neural network, I'll basically go ahead and train it. So, go ahead and find, know it's now no longer just two sets of weights, it's two weights here, two weights here, two weights here. So, that's ten weights here, plus five weights here, so 15 weights that we get to basically tweak around to go find me a set of weights that capture this data. It takes a little bit long and it basically comes back with. Well, everything inside this triangle like shape is going to be blue and everything outside it is going to be orange. Is that is that reasonable? Yeah it's not going to be perfect, but it's a pretty reasonable approximation to this data and it'll help you predict with pretty darn good accuracy how well this is going to do and that's the point of a Neural Network. The idea is to basically capture what the data are, in such a way that you can do your prediction later on. So, notice that what we did here was rather than take our human insight, we're able to use a Neural Network to essentially get at a good enough end result. So this is a Neural Network with one hidden layer, that's one layer of these neurons. So there are five of those nodes. We could also create extra hidden layers. So now we have ten sets of weights here, ten sets of weights here, and then five sets of weights. So that's now a whole bunch more weights. This is 10, but each of these has five, so that's five times five that's 25. So, 10 plus 25 is 35 plus five is 40 weights. So we now have a model that a lot more complex and we basically get again, reasonably good results. The basic rule of thumb is to go with the simplest possible network that gives you good enough performance. So in this case, we would go with just one hidden layer but here, we'll have to choose a number of nodes and let's say we start with two nodes and let's see does this do well. It turns out that node two nodes are not enough for this problem it doesn't do very well there are all these errors here in terms of capturing them. So let's stop that, let's say we add a third node and then we say start this and with three nodes it seems to be fine. So, except maybe some of these guys, the ones on the edges are probably not completely right but it's close enough. So in this situation, I would probably go with just three nodes. That's the simplest neural network that gives me good enough performance.