案例学习：预测房价

Loading...

来自 华盛顿大学 的课程

机器学习：回归

3442 评分

案例学习：预测房价

从本节课中

Ridge Regression

You have examined how the performance of a model varies with increasing model complexity, and can describe the potential pitfall of complex models becoming overfit to the training data. In this module, you will explore a very simple, but extremely effective technique for automatically coping with this issue. This method is called "ridge regression". You start out with a complex model, but now fit the model in a manner that not only incorporates a measure of fit to the training data, but also a term that biases the solution away from overfitted functions. To this end, you will explore symptoms of overfitted functions and use this to define a quantitative measure to use in your revised optimization objective. You will derive both a closed-form and gradient descent algorithm for fitting the ridge regression objective; these forms are small modifications from the original algorithms you derived for multiple regression. To select the strength of the bias away from overfitting, you will explore a general-purpose method called "cross validation". <p>You will implement both cross-validation and gradient descent to fit a ridge regression model and select the regularization constant.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

So now let's talk about a way to automatically address this issue by

modifying the cost term that we're minimizing when we're

addressing how good our fit is.

So, in particular we're looking at this orange box, this quality metric.

And before our quality metric just depended on the difference between

our predicted house sales price, and our actual house sales price.

In particular we're looking at residual sum of squares for measure of fit.

But now we're gonna modify this quality metric to also take into account

a measure of the complexity of the model.

In particular, in order to buy assess toward simpler models.

So when we're thinking about defining this modified cost function, what we're gonna

want to do is balance between how well the function fits the data and

a measure of how complex, or how potentially over fit, the model is.

And what did we see was an indicator of that?

The magnitude of our estimated coefficients.

So, what we're going to balance between is the fit of the model to the data and

the magnitude of the coefficients of the model.

Okay, so we can write down a total cost that has these two terms.

Where this is our new measure of the quality of the fit, and

when I say measure of fit here, what I mean is that a small

number indicates that there's a good fit to the data.

And on the other hand, the measure of the magnitude of the coefficients if that

number is small that means the size of the coefficients are small and

we're unlikely to be in this setting of a very overfit model.

Okay, so clearly we want to balance between these two measures,

because if I just optimize the magnitude of the coefficients,

I'd set all the coefficients to zero and that would sure not be overfit,

but it also would not fit the data well.

So that would be a very high bias solution.

On the other hand, if I just focused on optimizing the measure of fit,

that's what we did before.

That's the thing that was subject to becoming overfit in the face of

complex models.

So somehow we want to trade off between these two terms, and

that's what we're going to discuss now.

Okay, what's our measure of fit?

At this point you guys should be pretty sick of hearing me say this.

It's our residual sum of squares, which I've written here and

hopefully this formula is quite familiar to you at this point.

But sometimes we also write it as follows where

remember this is our predicted value using w

in our model to make these predictions.

And just remember that a small residual sum

of squares is indicative of the model.

Fitting the traiing data well.

So just as we said on the previous slide, when we're thinking about measure of fit,

a small number is gonna indicate a good fit.

Okay, so now what we need is a measure of the magnitude of the coefficients.

So what summary number might be indicative of the size of the regression

coefficients?

Well maybe you think about just summing all the coefficients together?

Is this gonna be a good measure of the overall magnitude of the coefficients?

Probably not in a lot of cases because

you might end up with a situation where,

let's say, w0 is 1,527,301 and

w1 is -1,605,253,

well if you look at and let's say w0 and

w1, the only two coefficients in our model.

If I look at w0 + w1, this is gonna be some small number.

Despite the fact that each of the coefficients themselves were quite large.

Okay, so you might say, I know how to fix this,

I'll just look at the absolute value.

So, maybe what I'll do, is I'll look at absolute

value w0 + w1 plus all the way up to wD and

this is, I'll just write this compactly,

sum from j=0 to capital D, the number of features we have.

Absolute value of wD, sorry, wj.

And this is defined to be equal to what's called the one norm

of the vector of coefficient w.

So we write it, so this is a vector, I'll try and make this a thick font here,

sub 1 and this is called L1 norm.

And this is actually a very reasonable choice.

And we're gonna discuss this more in the next module.

But for now the thing that we're gonna

consider is to consider the sum of the squares of the coefficients.

So w0 squared w1 squared, all the way up to wD squared.

So this is the sum j equals zero to capital D, of wj squared.

And this is defined to be equal to, we've actually seen this norm

many times in this class so far, it's the two norm squared.

So this is called our L2 norm, or really the L2 norm squared and

this is gonna be the focus of this module.

Okay.

So again, what we have, just to summarize, is we have our total cost

is a sum of the measure of fit + a measure of the magnitude of coefficients and

we said our measure of fit is our residual sum of squares.

And our measure of the magnitude of the coefficients for

this module is going to be this two norm of the w vector squared.

[MUSIC]