案例学习：预测房价

Loading...

来自 University of Washington 的课程

机器学习：回归

3658 个评分

案例学习：预测房价

从本节课中

Closing Remarks

In the conclusion of the course, we will recap what we have covered. This represents both techniques specific to regression, as well as foundational machine learning concepts that will appear throughout the specialization. We also briefly discuss some important regression techniques we did not cover in this course.<p> We conclude with an overview of what's in store for you in the rest of the specialization.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

So in modules one and two we described how to fit different models and

in module two we described how to fit very complex models.

But at up to our third module we had no idea how to access

whether that fitted model was going to perform well in our prediction tasks.

So in module three, that was our emphasis in assessing the performance

of our fitted module and thinking about how we can select between different models

to get good predictive performance.

So the first notion that we introduced in order to measure how

good our fit was performing was the measure of loss.

So this is kind of a negative measure of performance where we wanna

lose as little as possible in making poor predictions.

We're just under an assumption that our predictions are not perfect.

And we discussed two different examples of loss metrics that

are very commonly used talking about this absolute error or this squared error.

Then with this loss function,

we talked about defining three different measures of air.

The first was our training error,

which we said was not a good assessment of the predicted performance of our model.

Then we defined something called our generalization, or

true error, which is what we really want.

We wanna say how well are we predicting

every possible observation that we might see out there.

And we said, okay, well can't actually compute that so then we defined something

called our test error which looks at the subset of our data

that was not including in the training set and looks at the model that was fit

on the training data set but now making predictions on these held out points.

And we said that test error was a noisy approximation to our generalization error.

And for these three different measures of error,

we talked about how they varied as a function of model complexity.

So training error, we know, goes down with increasing model complexity but

that doesn't indicate that we get better and

better predictions as we increase our model complexity.

But in contrast, if we look at generalization error, true error,

these tend to increase, the errors tend to increase after a certain point.

We say that that point is when this models start to become overfit.

Because they perform very well on the training data set,

but they don't generalize well to new data that we have not yet seen.

And again, although we discuss this in the context of regression, this notion of

training, test, generalization error, and variations with model complexity is

a much more general concept that we'll see again in the specialization.

We then characterized three different sources that contribute to our

prediction error.

These are, the noise that's inherent in the data.

This is our irreducible error.

We have no control over it.

It has nothing to do with our model or our estimation procedure but

then we talked about this idea of bias and variance.

So we just described bias as saying how well can our model fit the true

relationship, averaging over all possible training data sets that we might see.

Whereas variance was describing how much can a fitted function

vary from training data set to training data set, all of size and observations.

So of course noise in the data can contribute to our errors in prediction,

but of course if our model can't adequately describe the true relationship

that's also a source of error as well as this variability from

training set to training set.

So of course we want low bias and low variance to have

good predicted performance, but we saw that there's this bias variance trade-off.

That as you increase model complexity, your bias goes down, but

your variance goes up.

And so there's this sweet spot that trades off between bias and

variance and results in the lowest what's called mean square error.

And that's what we're seeking to find.

And like we've said multiple times,

machine learning is all about exploring this bias variance tradeoff.

Then with concluded this module by saying,

how are we both going to select our model and assess its performance?

And for this we said,

well we need to actually form something called a validation set.

So we're going to fit our model on the training data set,

we're going to select between different models or thinking about selecting

a tuning perimeter describing these different models on our validation set and

then testing the performance on our test set, where we never touched the test data.

In later modules, like we're going to describe,

we talked about how if you don't have enough data to form this validation set,

you can think about doing cross-validation instead.

Then, in our fourth module, we talked about rich regression.

And remember that as our models become more and

more complex, we can become overfit and what we saw is the symptom of overfitting

was that the magnitude of our estimated coefficients just exploded.

So what ridge regression does is it trades off between

a measure of fit of our function to our training data and

a measure of the magnitude of the coefficients.

And implicitly by balancing these two terms we're

doing a bias-variance tradeoff.

In particular we saw that our rich regression objective sought to minimize

our residuals sum of squares plus lambda plus the L2 norm of our coefficients,

and we talked about what the coefficient path of our ridge solution looked like, as

we varied this tuning parameter, lambda, the penalty strength on this L2 norm term.

And we saw that as you increase this penalty parameter,

the magnitude of our coefficients become smaller and smaller and smaller.

Then for our ridge objective just like we did

in our standard lease squares objective, we computed the gradient,

set it equal to zero to get our closed-form solution and this looks

very similar to our solution we had before except with this additional term.

And what we talked about in this module is the fact that by adding

this lambda times the identity matrix.

This allowed us to have a solution,

even when the number of features was larger than the number of observations.

And it allowed for a much more quote, unquote, regularized solution.

That's why it's called a regularized regression technique.

But the complexity of the solution was exactly the same as we had for

these squares, cubic in the number of features that we have.

We also talked about a gradient descent implementation of ridge.

And as we saw, a key question in what solution we would get out of ridge

was determined by this lambda penalty strength.

And so, for this, instead of talking about cutting out a validation set to select

this tuning parameter, we talked about cases where you might not have enough data

to do that, and instead described this cross validation procedure.

[MUSIC]