案例学习：预测房价

Loading...

来自 University of Washington 的课程

机器学习：回归

3658 个评分

案例学习：预测房价

从本节课中

Assessing Performance

Having learned about linear regression models and algorithms for estimating the parameters of such models, you are now ready to assess how well your considered method should perform in predicting new data. You are also ready to select amongst possible models to choose the best performing. <p> This module is all about these important topics of model selection and assessment. You will examine both theoretical and practical aspects of such analyses. You will first explore the concept of measuring the "loss" of your predictions, and use this to define training, test, and generalization error. For these measures of error, you will analyze how they vary with model complexity and how they might be utilized to form a valid assessment of predictive performance. This leads directly to an important conversation about the bias-variance tradeoff, which is fundamental to machine learning. Finally, you will devise a method to first select amongst models and then assess the performance of the selected model. <p>The concepts described in this module are key to all machine learning problems, well-beyond the regression setting addressed in this course.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

Okay, so we can't compute generalization error, but we want some better measure of

our predictive performance than training error gives us.

And so this takes us to something called test error,

and what test error is going to allow us to do is approximate generalization error.

And the way we're gonna do this is by approximating the error,

looking at houses that aren't in our training set.

So to do that, we have to hold out some houses.

So instead of including all these colored houses in our training set,

which is these colored houses are our entire recorded data set,

we're gonna shade out some of them, these shaded gray houses and

we're gonna make these into what's called a test set.

Okay. So here we have houses that are not

included in our training set, the training set are the remaining colored houses here.

And when we go to fit our models,

we're just going to fit our models on the training data set.

But then when we go to assess our performance of that model,

we can look at these test houses, and these are hopefully

going to serve as a proxy of everything out there in the world.

So hopefully, our test data set is a good measure of other houses that we might see,

or at least in order to think of how well a given model is performing.

Okay, so test error is gonna be our average loss

computed over the houses in our test data set.

So formally, we write it as follows where we have one over N test.

N test are the number of houses in our test data set

times the sum of the loss defined over those test set houses.

But I wanna emphasize, and this is really, really important,

that the estimated parameters W hat were fit on the training data set.

Okay, so even though this function looks very, very, very much like training error,

the sum is over the test houses, but

the function we're looking at was fit on training data.

Okay, so these parameters in this fitted function never saw the test data.

So just to illustrate this, like in our previous example,

we might think of fitting a quadratic function through this data,

where we're gonna minimize the residual sum of squares on the training points,

those blue circles, to get our estimated parameters W hat.

Then when we go to compute our test error, which in this case again we're gonna

use squared error as an example, we're computing this error

over the test points, all these grey different circles here.

So test error is 1 over N times the sum of the difference between our

true house sales prices and our predicted price

squared summing over all houses in our test data set.

Okay, so this is where the difference arises,

where this function was fit with the blue circles.

The one we're assessing, the performance, we're looking at these grey circles.

Okay, so let's summarize our measures of error as a function of model complexity.

And what we saw was that our training error

decreased with increasing model complexity.

So here, this is our training error.

And in contrast, our generalization error went down for some period of time.

But then we started getting to overly complex models that

didn't generalize well, and the generalization error started increasing.

So here we have generalization error.

Or true error.

And what is our test error?

Well, our test error is a noisy approximation of generalization error.

Because if our test data setting included everything we might ever see in the world

in proportion to how likely it was to be seen,

then that would be exactly our generalization error.

But of course, our test data set is just some finite data set, and

we're using it to approximate generalization error, so

it's gonna be some noisy version of this curve here.

So this is our test error.

Okay, so test error is the thing that we can actually compute.

Generalization error is the thing that we really want.

[MUSIC]