By the way, by convention the summation

here starts from one so I

am not actually going penalize theta

zero being large.

That sort of the convention that,

the sum I equals one through

N, rather than I equals zero

through N. But in practice,

it makes very little difference, and,

whether you include, you know,

theta zero or not, in

practice, make very little difference to the results.

But by convention, usually, we regularize

only theta through theta

100. Writing down

our regularized optimization objective,

our regularized cost function again.

Here it is. Here's J of

theta where, this term

on the right is a regularization

term and lambda

here is called the regularization parameter and

what lambda does, is it

controls a trade off

between two different goals.

The first goal, capture it

by the first goal objective, is

that we would like to train,

is that we would like to fit the training data well.

We would like to fit the training set well.

And the second goal is,

we want to keep the parameters

small, and that's captured by

the second term, by the regularization objective. And by the regularization term.

And what lambda, the regularization

parameter does is the controls the trade of

between these two

goals, between the goal of fitting the training set well

and the

goal of keeping the parameter plan

small and therefore keeping the hypothesis relatively

simple to avoid overfitting.

For our housing price prediction

example, whereas, previously, if

we had fit a very high

order polynomial, we may

have wound up with a very,

sort of wiggly or curvy function like

this. If you still fit a high order polynomial

with all the polynomial

features in there, but instead,

you just make sure, to use

this sole of regularized objective, then what

you can get out is in

fact a curve that isn't

quite a quadratic function, but is

much smoother and much simpler

and maybe a curve like the magenta

line that, you know, gives a

much better hypothesis for this data.

Once again, I realize

it can be a bit difficult to see why strengthening the

parameters can have

this effect, but if you

implement yourselves with regularization

you will be able to see

this effect firsthand.