In the examples we've used so far for classification.

We've primarily focused on binary classification,

where the target value to be predicted was

a binary value that was either positive or negative class.

In a lot of real world data sets the target value to be predicted is actually a category.

So for example, in our fruit dataset there were

four different categories of fruit to be predicted, not just two.

So, how do we deal with this multiclass classification situation with scikit-learn?

Well, fortunately scikit-learn makes it very easy to

learn multiclass classification models.

Essentially, it does this by converting

a multiclass classification problem into a series

of binary problems. What do I mean by that?

Well, essentially when you pass in

a dataset that has a categorical variable for the target value,

scikit-learn detects this automatically and then for each class to be predicted.

Scikit-learn creates one binary classifier

that predicts that class against all the other classes.

So for example, in the fruit dataset there are four categories of fruit.

So scikit-learn learns four different binary classifiers.

To predict a new data instance,

what it then does is,

takes that data instance to be predicted,

whose labels to be predict,

and runs it against each of the binary classifiers in turn,

and the classifier that has the highest score is the one that,

whose class it uses,

as the prediction value.

So, let's look at a specific example of

multiclass classification with this fruit dataset.

Here, we simply pass in the normal dataset that

has the value from one to four as the category of fruit to be predicted.

And we fit it exactly the same way that we would

fit the model as if it were a binary problem.

And in general, if we're just, you know, fitting,

and then predicting, all of this would be completely transparent.

Scikit-learn would simply do the right thing and it would learn multiple classes,

and it would predict multiple classes,

and we wouldn't really have to do much else.

However, we can get access to what's happening under the hood as it were,

if we look at the coefficients and the intercepts of

the linear models that result from fitting to the training data.

And this is what this example shows.

So, what we're doing here is fitting

a linear support vector machine to the fruit training data.

And if we look at the coefficient values,

we'll see that instead of just one pair of coefficients for a single linear model,

a classifier, we actually get four values.

And these values correspond to the four classes of fruit in the training set.

And so, what scikit-learn has done here is it's created four binary classifiers,

one for each class.

And so, you can see there are four pairs of coefficients here and

there are also four intercept values.

So, in this case the first pair of coefficients

corresponds to a classifier that classifies apples versus the rest of the fruit,

and so, these pair of coefficients and this intercept define a straight line.

In this case the apples in this visual are the red points, and so,

the coefficients of the apple model define a decision boundary,

a linear decision boundary,

that's marked by this red line here.

And if you plot it out you'll see that indeed it has

an intercept of negative three, and, you know,

you can actually compute for any data instance using

this linear formula with what will predict either apple or not an apple.

So, if we take a specific example,

something that has a height of two

and a width of six so some quick point here, for example.

This is a linear classifier,

so we can take the coefficients.

This first pair of coefficients here,

and this first intercept value here,

and these form a linear classifier for apples versus not apples.

So we can take the height feature,

multiply it by the first coefficient,

take the width feature,

multiply it by the second coefficient,

and then add in the third intercept feature.

This third biased term.

And so, to predict whether this object that

has height of two and width of six is an apple,

we simply compute this linear formula.

It turns out that the value is positive .59.

And so, because that value is greater than or equal to zero that

indicates that the model is predicting that the object is indeed an apple.

And that makes sense because it's on this side of

the apple versus not apple binary classifier decision boundary.

Similarly, we can take another object,

say one who has a height of two and the width of two.

So, in this part of the space over here.

And we can plug in those two feature values into the same apple classifier.

And when we do that it turns out that the prediction value

for the linear model in that case is negative 2.3 and that's less than zero,

which is on this side of the decision boundary.

And so, the linear model for apple versus not apple

here is predicting that this object is not an apple.

And so, again, when scikit-learn has to predict

the class of a new object with potentially multiple classes,

it will go through each of these binary classifiers in turn and it will

predict the class whose classifier has the highest score for that instance.