0:10

So in binary classification, you're usually predicting one of two categories.

Â So, again, you might be predicting whether

Â someone's alive or dead, or sick or healthy.

Â You might also be predicting whether they will click

Â on an ad or they won't click on that ad.

Â But your predictions often come out to be quantitative.

Â In other words almost all the modeling algorithms that we have won't

Â just assign you to one class or the other, they might give you

Â say a probability of being alive, or a prediction on a scale of

Â 1 to 10 about whether you will click on the ad or not.

Â The cut iff you choose gives different results.

Â In other words, if we predict a probability

Â that you're going to be alive is 0.7 and we

Â say all the people above 0.5 will be

Â assigned to be alive, then that's one prediction algorithm.

Â Alternatively if you say only people with a probability

Â of about 28 will be predicted to be alive.

Â Then that will give you a different prediction for that same person.

Â And the different cut offs will have different properties, they will be better

Â at finding people that are alive or better at finding people that are dead.

Â So if people use what's called an ROC curve.

Â And so the idea is on the y axis what

Â they usually plot is one minus the specificity in other words

Â the probability of being a false positive and on the y

Â axis they plot the sensitivity over the probability at true positive.

Â And so what they're trying to do then is make a curve

Â where every single point along this curve corresponds to exactly one cut off.

Â So for a particular cutoff, you get a certain

Â probability of being a false positive or a certain

Â specificity, and that's this point, one minus the specificity

Â is this point right here on this x axis.

Â At the same time, you get a certain

Â sensitivity, and that's this point here, on the Y-axis.

Â And so, you can use these ROC curves to

Â define, whether an algorithm is good or bad by plotting

Â a different point for every single, cutoff that you

Â might choose, and then plotting a curve through those points.

Â So this is an example of a couple of, algorithms that are used to predict.

Â So these are examples of what real ROC curves look like.

Â So, for example, if you want to look at, say, the NetChop algorithm.

Â And you look at say 1 minus the specificity is equal

Â to 0.2 in other words this specificity is quite high, it's 0.8.

Â You can trace that number up and you can get about here on the curve

Â the red curve in this case and that

Â corresponds to a sensitivity to only about 0.4.

Â So it's very highly specific but not very sensitive.

Â And similarly if you move out here to the right on the Y X axis.

Â You'll get less and less specificity, and more and more sensitivity.

Â And so this curve tells you a little bit about the tradeoffs.

Â Now although the curve tells you a little bit about the tradeoffs,

Â you may actually want to know which of these prediction algorithms is better.

Â And one way that people use to quantify one curve versus

Â the other, is they basically calculate the area under the curve.

Â In other words, they follow, say, the red curve

Â here, and they trace the entire area underneath that curve,

Â so that'd be this area down here, and so

Â that area quantifies how good, that particular prediction algorithm is.

Â So, the higher the area the better the predictor is, and some

Â standard, there are some standard values that make sense to pay attention to.

Â So, if the area under the curve or AUC is

Â equal to 0.5 that's equivalent to a prediction algorithm that's just

Â on the 45 degree line and that's equivalent to basically randomly

Â guessing whether you're going to be a true positive or false positive.

Â So, 0.5 is actually quite bad, anything less

Â than 0.5 is actually worse than random guessing.

Â So, it's worse than flipping a coin.

Â AUC of 1 is a perfect classifier, in other words you get perfect sensitivity and

Â specificity for, for a certain value of the

Â prediction algorithm, and so that's the perfect classifier.

Â In general, it depends on the field, and it depends on the

Â problem you are looking at, people often think of an AUC above 0.8.

Â Could be considered to be something that's sort

Â of a good AUC of a particular prediction algorithm.

Â In general something to pay attention to is what, how did you just look at

Â a ROC curve and decide whether it's a good ROC curve or a bad ROC curve.

Â So remember the 45 degree line corresponds to just guessing.

Â In other words, the sensitivity and the specificity

Â match each other on this predictive o line.

Â 4:11

So, a perfect classifier goes at one minus the specificity.

Â So, one minus the specificity of zero means perfect specificity and

Â it jumps straight away up to perfect sensitivity and then straight over.

Â So that curve represent a perfect classifier

Â when it goes straight up to this corner.

Â So the further you are towards the upper left

Â hand corner of the plot the better the ROC is.

Â And the farther you are down to this bottom right

Â hand corner of the plot the worse the ROC is.

Â And so points along the, the, graphs along the

Â forty five degree line or below are considered pretty bad.

Â And you want your curve to look, to go as straight up towards this

Â top left hand corner and over as over as much as you possibly can.

Â So that's how you interpret an ROC curve, which will be used later when

Â we're picking out particular values of, predictors

Â of binary outputs that output quantitative numbers.

Â