案例学习：预测房价

Loading...

来自 华盛顿大学 的课程

机器学习：回归

3517 评分

案例学习：预测房价

从本节课中

Assessing Performance

Having learned about linear regression models and algorithms for estimating the parameters of such models, you are now ready to assess how well your considered method should perform in predicting new data. You are also ready to select amongst possible models to choose the best performing. <p> This module is all about these important topics of model selection and assessment. You will examine both theoretical and practical aspects of such analyses. You will first explore the concept of measuring the "loss" of your predictions, and use this to define training, test, and generalization error. For these measures of error, you will analyze how they vary with model complexity and how they might be utilized to form a valid assessment of predictive performance. This leads directly to an important conversation about the bias-variance tradeoff, which is fundamental to machine learning. Finally, you will devise a method to first select amongst models and then assess the performance of the selected model. <p>The concepts described in this module are key to all machine learning problems, well-beyond the regression setting addressed in this course.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

Okay.

Let's wrap up by talking about two really important task when you're doing

regression.

And through this discussion,

it's gonna motivate another important concept of thinking about validation sets.

So, the two important task in regression,

is first we need to choose a specific model complexity.

So for example, when we're talking about polynomial regression,

what's the degree of that polynomial?

And then for our selected model, we assess its performance.

And actually these two steps aren't specific gesture regression.

We're gonna see this in all different aspects of machine learning, where we have

to specify our model and then we need to assess the performance of that model.

So, what we're gonna talk about in this portion of this module

generalizes well beyond regression.

And for this first task, where we're talking about choosing the specific model.

We're gonna talk about it in terms of sum set of tuning parameters,

lambda, which control the model complexity.

Again, and for example, lambda might specify the degree of the polynomial and

polynomial aggression.

So, let's first talk about how we can think about choosing lambda.

And then for a given model specified by lambda, a given model complexity,

let's think about how we're gonna assess the performance of that model.

Well, one really naive approach is to do what we've described before,

where you take your data set and split it into a training set and a test set.

And then, what we're gonna do is for

our model selection portion where we're choosing the model complexity lambda.

For every possible choice of lambda, we're gonna estimate model

parameters associated with that lambda model on the training set.

And the we're gonna test the performance of that fitted model on the test set.

And we're gonna tabulate that for every lambda that we're considering.

And we're gonna choose our tuning

parameters as the ones that minimize this test error.

So, the ones that perform best on the test data.

And we're gonna call those parameters lambda star.

So, now I have my model.

I have my specific degree of polynomial that I'm gonna use.

And I wanna go and assess the performance of this specific model.

And the way I'm gonna do this is I'm gonna take my test data again.

And I'm gonna say, well, okay,

I know that test error is an approximation of generalization error.

So, I'm just gonna compute the test error for

this lambda star fitted model.

And I'm gonna use that as my approximation of the performance of this model.

Well, what's the issue with this?

Is this gonna perform well?

No, it's really overly optimistic.

So, this issue is just like what we saw when we weren't dealing with this notion

of choosing model complexity.

We just assumed that we had a specific model, like a specific degree polynomial.

But we wanted to asses the performance of the model.

And the naive approach we took there was saying,

well, we fit the model to the training data, and

then we're gonna use training error to assess the performance of the model.

And we said, that was overly optimistic because we were double dipping.

We already used the data to fit our model.

And then, so

that error was not a good measure of how we're gonna perform on new data.

Well, it's exactly the same notion here and let's walk through why.

Most specifically, when we're thinking about choosing our model complexity,

we were using our test data to compare between different lambda values.

And we chose the lambda value that minimized the error on that test data that

performed the best there.

So, you could think of this as having fit lambda,

this model complexity tuning parameter, on the test data.

And now, we're thinking about using test error

as a notion of approximating how well we'll do on new data.

But the issue is, unless our test data represents everything we might see out

there in the world, that's gonna be way too optimistic.

Because lambda was chosen, the model was chosen, to do well on the test data and

so that won't generalize well to new observations.

So, what's our solution?

Well, we can just create two test data sets.

They won't both be called test sets, we're gonna call one of them a validation set.

So, we're gonna take our entire data set, just to be clear.

And now, we're gonna split it into three data sets.

One will be our training data set, one will be what we call our validation set,

and the other will be our test set.

And then what we're gonna do is, we're going to fit our model parameters always

on our training data, for every given model complexity that we're considering.

But then we're gonna select our model complexity as the model that

performs best on the validation set has the lowest validation error.

And then we're gonna assess the performance of that

selected model on the test set.

And we're gonna say that that test error is now an approximation of our

generalization error.

Because that test set was never used in either fitting our parameters, w hat,

or selecting our model complexity lambda, that other tuning parameter.

So, that data was completely held out, never touched, and

it now forms a fair estimate of our generalization error.

So in summary, we're gonna fit our model parameters for

any given complexity on our training set.

Then we're gonna, for every fitted model and for every model complexity,

we're gonna assess the performance and tabulate this on our validation set.

And we're gonna use that to select the optimal set of tuning parameters

lambda star.

And then for that resulting model, that w hat sub lambda star,

we're gonna assess a notion of the generalization error using our test set.

And so a question, is how can we think about

doing the split between our training set, validation set, and test set?

And there's no hard and fast rule here,

there's no one answer that's the right answer.

But typical splits that you see out there are something like an 80-10-10 split.

So, 80% of your data for training data, 10% t for validation, 10% for tests.

Or another common split is 50%, 25%, 25%.

But again, this is assuming that you have enough data to do this type of split and

still get reasonable estimates of your model parameters,

reasonable notions of how different model complexities compare.

Because you have a large enough validation set, and

you still have a large enough test set

in order to assess the generalization error of the resulting model.

And if this isn't the case, we're gonna talk about other methods that

allow us to do these same types of notions, but

not with this type of hard division between training, validation, and test.

[MUSIC]