0:12

What's a multiclass classification problem?

Here are some examples.

Lets say you want a learning algorithm to automatically put your email into

different folders or to automatically tag your emails so

you might have different folders or different tags for work email,

email from your friends, email from your family, and emails about your hobby.

And so here we have a classification problem with four classes which we might

assign to the classes y = 1, y =2, y =3, and y = 4 too.

And another example, for medical diagnosis,

if a patient comes into your office with maybe a stuffy nose,

the possible diagnosis could be that they're not ill.

Maybe that's y = 1.

Or they have a cold, 2.

Or they have a flu.

0:59

And a third and final example if you are using machine learning to classify

the weather, you know maybe you want to decide that the weather is sunny, cloudy,

rainy, or snow, or if it's gonna be snow, and so in all of these examples,

y can take on a small number of values, maybe one to three, one to four and

so on, and these are multiclass classification problems.

And by the way, it doesn't really matter whether we index is at 0, 1,

2, 3, or as 1, 2, 3, 4.

I tend to index my classes starting from 1 rather than starting from 0,

but either way we're off and it really doesn't matter.

Whereas previously for

a binary classification problem, our data sets look like this.

For a multi-class classification problem our data sets may look like

this where here I'm using three different symbols to represent our three classes.

So the question is given the data set with three classes where

this is an example of one class, that's an example of a different class, and

that's an example of yet a third class.

How do we get a learning algorithm to work for the setting?

We already know how to do binary classification using a regression.

We know how to you know maybe fit a straight line to set for the positive and

negative classes.

You see an idea called one-vs-all classification.

We can then take this and make it work for multi-class classification as well.

Here's how a one-vs-all classification works.

And this is also sometimes called one-vs-rest.

Let's say we have a training set like that shown on the left,

where we have three classes of y equals 1, we denote that with a triangle,

if y equals 2, the square, and if y equals three, then the cross.

What we're going to do is take our training set and

turn this into three separate binary classification problems.

I'll turn this into three separate two class classification problems.

So let's start with class one which is the triangle.

We're gonna essentially create a new sort of fake training set where classes two and

three get assigned to the negative class.

And class one gets assigned to the positive class.

You want to create a new training set like that shown on the right, and

we're going to fit a classifier which I'm going to call h subscript theta

superscript one of x where here

the triangles are the positive examples and the circles are the negative examples.

So think of the triangles being assigned the value of one and

the circles assigned the value of zero.

And we're just going to train a standard logistic regression classifier and

maybe that will give us a position boundary that looks like that.

Okay?

3:34

This superscript one here stands for class one, so we're doing this for

the triangles of class one.

Next we do the same thing for class two.

Gonna take the squares and assign the squares as the positive class, and

assign everything else, the triangles and the crosses, as a negative class.

And then we fit a second logistic regression classifier and call this h of

x superscript two, where the superscript two denotes that we're now doing this,

treating the square class as the positive class.

And maybe we get classified like that.

And finally, we do the same thing for the third class and

fit a third classifier h super script three of x, and

maybe this will give us a decision bounty of the visible cross fire.

This separates the positive and negative examples like that.

4:22

So to summarize, what we've done is, we've fit three classifiers.

So, for i = 1, 2,

3, we'll fit a classifier x super script i subscript theta of x.

Thus trying to estimate what is the probability that y is equal to class i,

given x and parametrized by theta.

Right? So in the first instance for

this first one up here, this classifier was learning to recognize the triangles.

So it's thinking of the triangles as a positive clause, so

x superscript one is essentially trying to estimate what is the probability

that the y is equal to one, given that x is parametrized by theta.

And similarly, this is treating the square class as a positive class and

so it's trying to estimate the probability that y = 2 and so on.

So we now have three classifiers,

each of which was trained to recognize one of the three classes.

Just to summarize, what we've done is we want to

train a logistic regression classifier h superscript i of x for

each class i to predict the probability that y is equal to i.

Finally to make a prediction, when we're given a new input x, and

we want to make a prediction.

What we do is we just run all three of our

classifiers on the input x and we then pick the class i that maximizes the three.

So we just basically pick the classifier,

I think whichever one of the three classifiers is most confident and so

the most enthusiastically says that it thinks it has the right clause.

So whichever value of i gives us the highest probability

we then predict y to be that value.