The first step of method [COUGH] for classification, I'm going to think about our methods based on analogy. This is what we're going to see in this lesson. So two main classification methods based on analogy, one is a support vector machine, or SVM, the other is a nearest-neighbor classifier, kNN. So let's start with SVM. Support vector machine generates mathematical functions that map into variables, input variables, to desired outputs for classification or regression type prediction problems. So this means that it can be used both for classification and for prediction tasks. So what SVM does is really a mapping. It takes an input space, which is a set of data in the dataset, and it transforms them into a more elaborate representation that makes classes separable. Meaning separable, it means you can draw line to separate the classes. To do that, it first uses what's called a kernel function. So it means each input data is going to be transformed by this kernel function into a different space. It's like making a projection on a higher dimensional space. That we transform non-linear relationships among the variables into linearly separable feature spaces. Once or two classes, of course you can generalize to any number of classes and to continuous variables as well. But once the two classes are linearly separable, then it builds a hyperplane that it maximizes in terms of the distance between the classes. Which is called the maximum-margin hyperplane, to optimally separate different classes from each other based on the training dataset. So as you see SVM has a solid mathematical formulation. A hyperplane is a plane, we talk about hyperplane when we are in dimension greater than two. So in two dimension, everybody knows what is a plane. But when you go into dimension three, four, five, well, three also you know what is a plane. But then you go into hyperplane above dimension three. It's a geometric concept used to describe the separation surface between classes of things and, again, a plane in dimension three you see what it is. It's a flat surface. So, for example, this table here, the surface of this table is a plane in dimension three, which is the dimension I live in now. But if we go beyond dimension three, your planes are not going to be flat anymore. They're going to have, like some in dimension four, we will have four minus one which will be three dimensions. So that's why we talk about hyper plane. But this is really similar to a plane. A kernel function, as I said, [COUGH] it's a function that transforms input variables into another space of variable using what's called a kernel tree. It means tricking the model that we're going to build to seeing that it deals with transformed data and not the original data. And to solve what's called a non-linear problem as a linear problem. Examples of kernel functions, a radial basis function, RBF. So that's an example here that shows the result actually of using the kernel trick. Suppose at the beginning our different data points would not be yellow on the left and blue on the right, they would all be mixed up in some way. And by applying the kernel function we would transform the variables to be that way, which is what's called linearly separable, which means that we can put all the yellow on one side of the line and the blue on other side of the line. And when this is done, I think that is the most difficult part of course is to do this, the kernel function. That's and SVM uses different kind of functions until it's a trial and error, until it finds the right one that works best for this particular set of data. And that's why its computation is expensive for that reason because it's going to try different kernel functions until it finds something as beautiful as that. Well, this would be the perfect case. You will have perfect classification with that. Most often, it doesn't manage to get exactly this perfect situation but close, close enough. So once you have that, it's very easy to draw a line between the two plane, hyperplane if you are over dimension three with a maximum-margin to separate. Maximum separation distance between the two classes. So thy are the most widely use kernel-learning algorithm for a wide range of classification and regression models. And this is one of the most used classification of prediction model. They have excellent generalization performance, probably because of this kernel function and trying different ones. Superior prediction power, ease of use and rigorous theoretical foundation. It often shows superiority in both regression and classification type prediction problems. Now, like all the methods that I'm going to talk about, they all have advantages and drawbacks. So the advantages of SVM is, yeah, very often it got excellent performance of classification prediction. However, it can be very time consuming, computationally intensive. So that's why it can be a drawback if you are working on big data. And for all the methods I'm going to talk about, when you specialize in this method, generally you manage to make it better. Because all of the methods we're going to be talking about they use some kind of process, actually. They're multiplied in a straightforward manner. For example, the SVM has different kernel. And so here is this is a matter of choice. All the methods I'm going to talk about they have some parameters, some fine tuning to make and so specialists, for example of SVMs will generally have better results than nonspecialists and same for k-NN and same for neural network, etcetera. So that's also something to really look into that you have specialists on these areas that can do very good work and get very good performance with any of these methods. So another method based on analogy is k nearest neighbor. It's more simplistic prediction method, but it can produce very competitive results in particular for Big Data. It's a prediction model for classification as well as regression types. It means a numerical, callings of prediction task. Similar to SVM, it's classified as a instance-based learning, because the difference with the other methods that we're going to see is that the instance-based learning models, they're going to keep the training data and use that on the new data. While the other methods that we're going to see, they all perform some kind of generalization, be it a generalized model and then they discard the training data. So that's also something to look into because the advantage of these Knn is that it can easily evolve over time. Because it can evolve its set of data, not only the training data, but it adds the new data that you're working on so it can be more easily evolve over time. K is a number of neighbors used. This is also, again, a parameter that you would have to set up. And, again, the data kept after modeling. Now, one of the big questions in this type of model is how many neighbors to take? So that's why it's called the kangerus neighbor. So you can use a twist. The one nearest neighbor, the two nearest neighbors, the three, and you have ways of automatically finding the optimal number of neighbors. So in this case for example we want to classify the new xi, y1 case, which is a star here. So if I just use one nearest neighbor, I would classify the star as a green because it's one nearest neighbor. Of course, he's green. But if I take three nearest neighbors, I would see that I have two blue and one green, and I do what's called a majority voting. Or sometimes I can do another age depending upon the task. And I would classify as blue. And then I could extend a number of neighbors to five, and with five, I would have three green and two blue, which is I would say that the start would be the color green. So as you see it really depends on the value of k and you'll have ways of the training data of optimizing this value of k. Now which distance measure? It depends on the problem, the most simple that can be used for numeric data is a Euclidian distance, but of course the distance measures can be much more complex. So, here this is what happens with nearest neighbor, as you can see. You take the data, training data, you separate into training and validations set. Based on that, you optimize your value of k and distance measure as well and then you predict. And once you are satisfied with your prediction, you can put into deployment but in deployment, you're not going to deploy simply a model, you're going to deploy the data that you trained on, you're going to apply that on the new data. But also the k that you have learned, even through this distance measure. You can also learn some way it's associated to the features. So there are a lot of ways of optimizing this particular model. I've placed in resources, videos that you can watch. And so this video clearly shows how to use a nearest neighbor classifier, and applies that to a very interesting example. So I encourage you to watch it. Thank you.