While logistic regression is fast, it is only useful for binary classification problems. In some cases, we may have more complex problems that we need to solve. If you look at the logic diagram on the slide here, you'll see such a circumstance where we have what is called a multi-label classification problem. When we're dealing with multi-label problems, what we're looking at is a situation where we may have to label something as being either A or B and also as either C or D, and also either as E or F and so on. You can see here to solve multi-label classification problems, we could configure multiple binary classifiers and have each of those binary classifiers make a determination between each of the mutually exclusive classes, either A or B and either C or D. But then, we can combine the labels assigned by each of those binary classifiers together to create this multi-label classifier. In this case, you can see that the input was classified as being class B and class B, so both labels were assigned to because those labels were not mutually exclusive. In the real world you might apply this to maybe literary works where perhaps you're trying to classify the genre of a particular book, and so you may need to determine, is this fiction or non-fiction? Well, those are mutually exclusive, but then you may have fiction that is targeted at adults or at young people or you may have nonfiction targeted young people or adults. We could have the classifier AB determining is it fiction or non-fiction, and then the classifier youth or adult. We can come out with basically four different possible combinations of labels based on using two separate binary classifiers. Multi-label classification, we can just simply stack up our logistic regression if we'd like to use that, uses many binary classifiers as necessary and combine our labels. But then we could also encounter a situation where we may have multi-class classification, where the classes are all mutually exclusive. Now, with multi-class classification, you will be placing your data examples into one of three or more classes. Again, looking at the logic diagram here on the slide, you can see in this case, the data example is going to be long to one and only one class. As we move down through the logic, you'll see, well test, does this belong to class 1? If the answer is yes, we exit the logic at that point. But if the answer is no, we check to see if it belongs to class 2. Then we check to see if it belongs to class 3. Yes, if it does belong to class 3, we would output that class. It's either going to belong to class 1, or class 2, or class 3, but it cannot belong to any combination of those. It can only belong to a single class because it is a multi-class classification problem. By looking at this, you might wonder then, we've seen that we could use multiple binary classifiers to solve a multi-label problem, but how would we solve a multi-class problem? Once again, going to the slide, you can see that we have what's called multinomial logistic regression. Now, this is sometimes referred to as a soft-max function or soft-max regression. The reason for that is that by using these formulas, we're going to be able to calculate the class probability for each of the classes. If we're starting with three classes, then of course, the probability of a data example belonging to each of those classes will add up to 100 percent, where there may be a certain percentage in each of those three classes. Our class assignment would, of course be performed based on the class that would have the highest probability. We start out with this multinomial logistic regression. It starts by computing a score for each of the classes where k is the class, Theta is the vector of model parameters, and x is a vector of the actual feature values. This is going to be computed for each of the classes of course. As these scores are computed for every class, we plug those into this soft-max function that we have here, where we're taking and calculating the probability that the example x belongs to a certain class given that earlier score. With this, you can see that what's happening is we're taking the exponent of each score for each one of the classes, and we're dividing that by the sum of the score of all of the exponents. In this way, we're determining the probability that that class has out of the total values. The output of this softmax function will always produce a value where the probabilities will always add up to one. If you had class 1, class 2, class 3, and the probabilities of each of those, and you added those up, you'd always end up being equal to a value of 1. But of course, the way we've illustrated things here, basically, again, with our literary works if we said, well, this book is either going to be a fantasy novel, romance, or horror, you can see that if we have a 0.89 percent fantasy 0.08 and 0.03, or in other words, three percent, eight percent, or 99 percent, you can see that they always will add up to a total of 1 or 100 percent, and this is the softmax function at work. It gives us a way of just simply making the determination as to which class has the highest probability and therefore that becomes the assignment in any type of multi-class problem.