案例学习：预测房价

Loading...

来自 University of Washington 的课程

机器学习：回归

3809 个评分

案例学习：预测房价

从本节课中

Assessing Performance

Having learned about linear regression models and algorithms for estimating the parameters of such models, you are now ready to assess how well your considered method should perform in predicting new data. You are also ready to select amongst possible models to choose the best performing. <p> This module is all about these important topics of model selection and assessment. You will examine both theoretical and practical aspects of such analyses. You will first explore the concept of measuring the "loss" of your predictions, and use this to define training, test, and generalization error. For these measures of error, you will analyze how they vary with model complexity and how they might be utilized to form a valid assessment of predictive performance. This leads directly to an important conversation about the bias-variance tradeoff, which is fundamental to machine learning. Finally, you will devise a method to first select amongst models and then assess the performance of the selected model. <p>The concepts described in this module are key to all machine learning problems, well-beyond the regression setting addressed in this course.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

Okay.

Well, now let's turn to this third component which is a variance.

And what variance is gonna say is,

how different can my specific fits to a given data set be from one another,

as I'm looking at different possible data sets?

And in this case, when we are looking at just this constant model,

we showed by that early picture where I drew points that

were mainly above the true relationship and the points mainly below,

that the actual resulting fits didn't vary very much.

And when you look at the space of all possible observations,

you see that the fits, they're fairly similar, they're fairly stable.

And so, when you look at the variation in these fits,

which I'm drawing with these grey bars here.

We see that they don't vary very much.

So, for this low complexity model,

we see that there's low variance.

So, to summarize what this variance is saying is, how much can the fits vary?

And if they could vary dramatically from one data set to the other,

then you would have very erratic predictions.

Your prediction would just be sensitive to what data set you got.

So, that would be a source of error in your predictions.

And to see this, we can start looking at high-complexity models.

So in particular, let's look at this data set again.

And now, let's fit some high-order polynomial to it.

So, that's some fit shown here.

And now, let's take again this same data set.

But let's choose two points, which I'm gonna highlight as these pink circles.

And let's just move them a little bit.

So, out of this whole data set, I've just moved two observations and

not too dramatically, but I get a dramatically different fit.

So then, when I think about looking over all possible data sets I might get,

I might get some crazy set of curves.

There is an average curve.

And in this case, the average curve is actually pretty well behaved.

Because this wild, wiggly curve is at any point, equally,

likely to have been wild above, or wild below.

So, on average over all data sets, it's actually a fairly smooth reasonable curve.

But if I look at the variation between these fits, it's really large.

So, what we're saying is that high-complexity models have high variance.

On the other, if I look at the bias of this model, so here again,

I'm showing this average fit which was this fairly well behaved curve.

And match pretty well to the true relationship between square feet and

house value, because my model is really flexible.

So on average, it was able to fit pretty precisely that true relationship.

So, these high-complexities models have low bias.

So, we can now talk about this bias-variance tradeoff.

So, in particular, we're gonna plot bias and

variance as a function of model complexity.

And so, what we saw in the past slides is that as our

model complexity increases, our bias decreases.

Because we can better and

better approximate the true relationship between x and y.

So, this curve here is our bias curve.

On the other hand, variance increases.

So, our very simple model had very low variance, and

the high-complexity models had high variance.

So, this is a picture of our variance.

And so, what we see is there's this natural tradeoff between bias and

variance.

And one way to summarize this is something that's called mean squared error.

And so, mean squared error, which if you watch the optional

videos that go into all these concepts more in depth.

You'll hear a lot more about mean squared error and a formal definition,

or the derivation of this.

But mean squared error is simply the sum of bias squared plus variance.

Okay.

I guess I'll write out variance to be very clear.

So, this is my little cartoon of bias squared plus variance.

This is my mean squared error curve.

And machine learning is all about this tradeoff between bias and variance.

We're gonna see this again and again in this course.

And we're gonna see it throughout the specialization.

And the goal is finding this sweet spot.

This is the sweet spot where we get our minimum error,

the minimum contribution of bias and variance, to our prediction errors.

So, not sweet, sweet.

It is sweet, sweet, but what I'm trying to write is sweet spot.

And this is what we'd love to get at.

That's the model complexity that we'd want.

But just like with generalization error, so

I'm gonna write this down with generalization error.

Can we compute this?

So, think about that while I'm writing.

We cannot compute bias and variance,

and less mean squared error.

And why?

Well, the reason is because just like with generalization error,

they were defined in terms of the true function.

Well, bias was defined very explicitly in terms of

the relationship relative to the true function.

And when we think about defining variance,

we have to average over all possible data sets, and the same was true for bias too.

But all possible data sets of size n, we could have gotten from the world,

and we just don't know what that is.

So, we can't compute these things exactly.

But throughout the rest of this course, we're gonna look at ways to optimize this

tradeoff between bias and variance in a practical way.

[MUSIC]