The term high variance is another

historical or technical one.

But, the intuition is that,

if we're fitting such a high

order polynomial, then, the

hypothesis can fit, you know,

it's almost as if it can

fit almost any function and

this face of possible hypothesis

is just too large, it's too variable.

And we don't have enough data

to constrain it to give

us a good hypothesis so that's called overfitting.

And in the middle, there isn't really

a name but I'm just going to write, you know, just right.

Where a second degree polynomial, quadratic function

seems to be just right for fitting this data.

To recap a bit the

problem of over fitting comes

when if we have

too many features, then to

learn hypothesis may fit the training side very well.

So, your cost function

may actually be very close

to zero or may be

even zero exactly, but you

may then end up with a

curve like this that, you

know tries too hard to

fit the training set, so that it

even fails to generalize to

new examples and fails to

predict prices on new examples

as well, and here the

term generalized refers to

how well a hypothesis applies even to new examples.

That is to data to

houses that it has not seen in the training set.

On this slide, we looked at

over fitting for the case of linear regression.

A similar thing can apply to logistic regression as well.

Here is a logistic regression

example with two features X1 and x2.

One thing we could do, is

fit logistic regression with

just a simple hypothesis like this,

where, as usual, G is my sigmoid function.

And if you do that, you end up

with a hypothesis, trying to

use, maybe, just a straight

line to separate the positive and the negative examples.

And this doesn't look like a very good fit to the hypothesis.

So, once again, this

is an example of underfitting

or of the hypothesis having high bias.

In contrast, if you were

to add to your features

these quadratic terms, then,

you could get a decision

boundary that might look more like this.

And, you know, that's a pretty good fit to the data.

Probably, about as

good as we could get, on this training set.

And, finally, at the other

extreme, if you were to

fit a very high-order polynomial, if

you were to generate lots of

high-order polynomial terms of speeches,

then, logistical regression may contort

itself, may try really

hard to find a

decision boundary that fits

your training data or go

to great lengths to contort itself,

to fit every single training example well.

And, you know, if the

features X1 and

X2 offer predicting, maybe,

the cancer to the,

you know, cancer is a malignant, benign breast tumors.

This doesn't, this really doesn't

look like a very good hypothesis, for making predictions.

And so, once again, this is

an instance of overfitting

and, of a hypothesis having

high variance and not really,

and, being unlikely to generalize well to new examples.

Later, in this course, when we

talk about debugging and diagnosing

things that can go wrong with

learning algorithms, we'll give you

specific tools to recognize

when overfitting and, also,

when underfitting may be occurring.

But, for now, lets talk about

the problem of, if we

think overfitting is occurring,

what can we do to address it?

In the previous examples, we had

one or two dimensional data so,

we could just plot the hypothesis and see what was going

on and select the appropriate degree polynomial.

So, earlier for the housing

prices example, we could just

plot the hypothesis and, you

know, maybe see that it

was fitting the sort of

very wiggly function that goes all over the place to predict housing prices.

And we could then use figures

like these to select an appropriate degree polynomial.

So plotting the hypothesis, could

be one way to try to

decide what degree polynomial to use.

But that doesn't always work.

And, in fact more often we

may have learning problems that where we just have a lot of features.

And there is not

just a matter of selecting what degree polynomial.

And, in fact, when we

have so many features, it also

becomes much harder to plot

the data and it becomes

much harder to visualize it,

to decide what features to keep or not.

So concretely, if we're trying

predict housing prices sometimes we can just have a lot of different features.

And all of these features seem, you know, maybe they seem kind of useful.

But, if we have a

lot of features, and, very little

training data, then, over

fitting can become a problem.

In order to address over

fitting, there are two

main options for things that we can do.

The first option is, to try

to reduce the number of features.

Concretely, one thing we

could do is manually look through

the list of features, and, use

that to try to decide which

are the more important features, and, therefore,

which are the features we should

keep, and, which are the features we should throw out.

Later in this course, where also

talk about model selection algorithms.

Which are algorithms for automatically

deciding which features

to keep and, which features to throw out.

This idea of reducing the

number of features can work

well, and, can reduce over fitting.

And, when we talk about model

selection, we'll go into this in much greater depth.

But, the disadvantage is that, by

throwing away some of the

features, is also throwing

away some of the information you have about the problem.

For example, maybe, all of

those features are actually useful

for predicting the price of a

house, so, maybe, we don't actually

want to throw some of

our information or throw some of our features away.