案例学习：预测房价

Loading...

来自 华盛顿大学 的课程

机器学习：回归

3449 评分

案例学习：预测房价

从本节课中

Ridge Regression

You have examined how the performance of a model varies with increasing model complexity, and can describe the potential pitfall of complex models becoming overfit to the training data. In this module, you will explore a very simple, but extremely effective technique for automatically coping with this issue. This method is called "ridge regression". You start out with a complex model, but now fit the model in a manner that not only incorporates a measure of fit to the training data, but also a term that biases the solution away from overfitted functions. To this end, you will explore symptoms of overfitted functions and use this to define a quantitative measure to use in your revised optimization objective. You will derive both a closed-form and gradient descent algorithm for fitting the ridge regression objective; these forms are small modifications from the original algorithms you derived for multiple regression. To select the strength of the bias away from overfitting, you will explore a general-purpose method called "cross validation". <p>You will implement both cross-validation and gradient descent to fit a ridge regression model and select the regularization constant.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

Now let's turn to this important question of how to choose the lambda tuning

parameter.

And what we mentioned last module was that if we have some tuning parameter that

controls model complexity, then we can think for

every value of that tuning parameter, we can fit our model on our training data.

Then we can assess the performance of that fitted model on a validation set,

and we can tabulate this for all values of lambda that we might consider.

And choose the specific model complexity according to the error on this validation

set, and then assess the performance of the selected model on our test set.

Well, now what we've seen is rich regression is a special case

of an objective where there's a tuning parameter, lambda,

that's controlling model complexity.

And we'd like to see how we can select this tuning parameter.

So, of course we can use exactly this procedure that we described last module.

And that's assuming that we have sufficient data to do

this training validation test split.

But now let's ask the question of, what if we don't have enough data to reasonably do

a divide into these three different sets?

What can you do?

And we're gonna talk about this in the context of rich regression, but again,

this holds for any tuning parameter lambda that you might have in selecting

between different model complexities.

Or any other tuning parameter controlling your model.

Okay, so we're assuming that we're starting with a smallish dataset.

And as always, we need to break off some test dataset that we're gonna hide, okay?

We always need to have some test set that's never touched during training or

validation of our model.

So now we took this smallish dataset and

we have even a smaller dataset to do our training and model validation.

So how are we gonna do this?

Well, we wanna do this in some way that's a bit smarter than

the naive approach of just forming our validation set.

So let's just remember this naive approach, where we took whatever data

was remaining after we split off our test set and we defined some validation set.

And a question is, in this case, when we have just a small amount of data, so

necessarily this validation set will just be a small number of observations,

is this sufficient for comparing between different model complexities and

accessing which one is best?

Well, no, clearly the answer is no.

We're saying that it's just a small set.

It's not gonna be representative of the space of things that we

might see out there.

Okay, so what can we do better?

We're stuck with just this dataset.

Well, did we have to use the last set of tabulated observations as

the observations to define this validation set?

No, I could of used the first few observations, or next set of observations,

or any random subset of observations in this dataset.

And a question is,

which subset of observations should I use as my validation set?

And the answer, and this is the key insight, is use all of the data subsets.

Because if you're doing that,

then you can think about averaging your performance across these validation sets.

And avoid any sensitivity you might have to one specific choice of validation set

that might give some strange numbers because it just has a few observations.

It doesn't give a good assessment of comparison between different model

complexities.

[MUSIC]