So here we are. We're intensively playground and the data set that we have, essentially seems to have blue dots in the lower left hand corner and in the upper right hand corner, it seems to have orange dots in the top left and on the bottom right. And let's say we have two raw inputs x1 and x2. And what you want is to basically use the x1 and x2 to train the model. So let's go ahead and train the model that takes x1 and x2 as input in this particular data set. And as you can see, it can keep training, but the background image doesn't actually change much, right? It's all washed out because x1 and x 2 and linear model, it doesn't really work in terms of a good learning capability. So the model doesn't actually learn much. So let's go and stop this and let's look at this again. It turns out that, it's a combination of x1 and x2 that actually matters. If x1 is negative and x2 is negative, it's blue. If x1 is positive and x2 is positive, it is blue. And if the x1 and x2 have different signs, then it seems to be orange. So what does that remind you of? That is a feature cross between x1 and x2. So let's go ahead and add x1 and x2's feature cross as another input. And now, let's go ahead and train and we can see almost immediately that we basically have a pretty good model that separates the blue from the yellow, and the background for the blue dots tends to be blue and the background of the yellow dots tends to be yellow, and there is of course noise where you have misclassification, but that's to be expected because it's a noisy data set. So the key idea is, by taking this human insight, this insight that, it's a combination of x1 and x2 that actually will allow us to better classify on this data set, we are able to add x1 and x2. Which is not actually a new input. It is essentially a feature engineering that we've carried out on the original inputs in x1 and x2, it allows us to separate the blue and the yellow pretty well. So let's take now a different case. In this case, you basically have the blue dots in the center, and the yellow dots out towards the edges. And again, if I just use x1 and x2 and I train it, the background image is all washed out because there isn't much that can be learned for this model. So we can say, well we should probably look at what kind of future engineering we can do, so let me go and stop this. What kind of future engineering can we do to basically do the separation? And again the intuition here is that, if x1 and x2 are both small, it is blue. If x1 and x2 are large, it tends to be yellow. But it's not x1 and x2 are both large. If you look at a point here, it is x1 is very small but x2 is large. So another way to think about this is, if you have to think of this as the center of the image, points that are close to the center tend to be blue, points that are far away from the center tend to be yellow. And what does that remind you of? Point close and far away, that's a distance. And what is equation of a distance? Square root of x squared plus y squared. Well, you don't need a square root because all we're doing here is that, we're using input features into a neural network, so we need x squared and y squared. So let's go ahead and take x1 squared and x2 squared both of them as inputs. And now let's go ahead and train, and you see that almost immediately, you basically have a good separation between the blue dots and the orange dots. So let's stop this. Let's go in and look at both of these. Both of these, the separation boundary is a linear boundary. Well, in this case it's pretty obvious. It's not a linear boundary. Even though we are using a linear model here, there is no hidden layers, there is no neural network. It's essentially a linear combination of the inputs. We are able to get a non-linear boundary. So that's something to realize. If you have feature crosses, even though you're using a linear model, because the feature cross is non-linear, you actually have a non-linear model. So feature crosses, one of the reasons why they work, is that they bring the power of non-linearity to bring on this problem. They bring them to bear on this problem while we don't actually have to pay the price of non-linearity. We don't have to worry about the models being too deep and lots of training problems, etc. There's a linear model. And the good thing about a linear model is that, the area surface is convex. Which means that you have a unique global minimum, it's relatively easy to find and you can find it. So you have the advantages of a linear model but, the power of non-linearity. And that's the cool thing about a feature cross. Even on the other case, this is also a non-linear boundary because it's two lines, right? It's not a single line, but it's not as obvious to see as it is in this case where it is an ellipse and an ellipse is obviously not a line. So, that's something to remember that, even when we have the power of neural networks and we want to use neural networks, you might want to consider including feature crosses as part of your toolkit, because feature crosses allow you to have a simple model but still get non-linearity.