0:00

If you run a the learning algorithm and it doesn't do as well as you were hoping,

almost all the time it will because you have either a high bias problem or a high

varience problem. In other words, either an under fitting problem or a over

fitting problem. And in this case it's very important to figure out which of

these two problems is bias or variance or a bit of both, that you actually have,

because knowing which of these, two things is happening will give a very

strong indicator for whether the use for, in promising ways to try to improve your

algorithm. In this video, I'd like to delve more

deeply into this bias in various issue and understand them better as well as

figure out how to look at a learning algorithm and evaluate or diagnosis

whether we might have a bias problem or a variance problem, since this would be

critical for figuring out how to improve the performance of a learning algorithm

that you may implement. So you've already seen this figure a few

times where if you fit two simple hypothesis that go straight line that

underfits the data. If you fit a two complex hypothesis, then

that might fit the training set perfectly but overfit the data and that is maybe

hypothesis of some intermediate level of complexities of some maybe degree two

polynomials, a not too low and not too high degree that's just right and gives

you the best generalization error of all of these options.

Now that we're armed with the notion of train, training and validation in test

sense, we can understand the concepts of bayes inference a little bit better.

Concretely, let's let our training error and cross validation error be defined as

in the previous videos. Just say the squared error, the average

squared error as measured in the training sets or as measured on the cross

validation set. Now lets plot the following figure on the

horizontal access I'm going to plot to the degree of polynomials.

So as I go to the right I'm going to be fitting higher and higher order of

polynomials. so way on the left on this figure where

maybe D equals one we're going to be fitting very simple functions whereas way

here on the right of the horizontal access have much larger values of D so I

have a much higher degree of polynomial and so here that's going to correspond to

fitting. Much more complex functions to your

training set. Let's look at the training error and the

cross validation error and plot them on this figure.

Let's start with the training error. As we increase the degree of the

polynomial we're going to be able to fit high training set better and better.

And so, if if D equals 1, we can have a relatively high trading error if we have

a very high degree polynomial our trading error is going to be really low, maybe

even zero because we'll fit the training set really well.

And so as we increase the degree of polynomial we find typically that the

training error decreases so I'm going to write J.

subscript train of data there, because our training tends to decrease with the

degree of the polynomial that we fit to the data.

Next, let's look at the cross-validation error or for that matter, if we look at

the test set error, we'll get a pretty similar result as if we were to plot the

cross validation error. So we know that if D1 equals 1, we're

fitting a very simple function and so we may be under fitting the training set and

so going to have a very high cross validation error.

If we fit, you know an intermediate degree polynomial, we had D2 equals 2 in

our example in the previous slide, we're going to have a much lower cross

validation error because we're just fitting, finding a much better fit to the

data. And conversely, if D were too high.

So D took on say a value of 4, then working over fitting and so we end it

with a high value for cause validation error.

So if you were to bury this smoothly and plot a curve, you might end up with a

curve like that. Where F, JCV of theta.

And again, if you plot J test of theta, you get something very similar.

And so this sort of plot also helps us to better understand the notions of bias and

variance. Our theory, suppose you've applied a

learning algorithm and it's not forming as well as you are hoping so, so if your

cross validation set error or your test set error is high, how can we figure out

if the learning algorithm is suffering from high bias or if it's suffering from

high variance? So the setting of the cross-validation

error being high corresponds to either this regime or this regime.

So this regime on the left here corresponds to a high bias problem that

is if you are fitting a overly low order polynomial such as a D equals one when we

really needed a higher order polynomial to fit data whereas in contrast this

regime corresponds to a high variance problem that is a deed of degree of

polynomial was too large for the data center we have and this figure gives us a

clue for how to distinguish for these two cases.

Concretely. For the high bias case.

That is the case of under-fitting. What we find is that both the cross

validation error and the trading error are going to be high, so if your

algorithm is suffering from a bias problem.

The training set error will be high. And you might find that the cross

validation error will also be high. It might be close, maybe just slightly

higher than a trading error. And so if you see this combination,

that's a sign that your algorithm may be suffering from high buyers.

In contrast, if your algorithm is suffering from high variance, then if you

look here, we'll notice that j trade, that is the trading error, is going to be

low. That is the fitting the training set very

well. Whereas, your cross-validation error.

assuming that this is say the square of error, you should try to minimize this.

Whereas the contrast your error on the cause validation set or your cost

function cause validation set will be much bigger.

The new training center. So there's a double greater than sign.

That's the math symbol for much greater than denoted by two greater than signs.

And so if you see this combination of values, then that might give you, that's

a clue that your learning algorithm may be suffering from high variance and might

be over fitting And the key that distinguishes these two cases is if you

have a high bias problem your training center will also be high.

Your hypothesis is just not fitting the training set well.

And if you have a high variance problem. Your trading center would usually be low.

That is much lower than your cost validation error.

So, hopefully though it gives you a somewhat better understanding of the two

problems of bias and variance. I still have a lot more to say about bias

and variance in these few videos. But, what we'll see later, is that by

diagnosing whether a learning algorithm may be suffering from high bias or high

variance, I'll show you, even more details on how

to do that in later videos. We'll see that by figuring out, whether a

learning algorithm may be suffering high bias or high variance or a combination of

both, that that would give us much better guidance of what one might be promising

things to try in order to improve the performance of a learning algorithm.