案例学习：预测房价

Loading...

来自 University of Washington 的课程

机器学习：回归

4025 个评分

案例学习：预测房价

从本节课中

Closing Remarks

In the conclusion of the course, we will recap what we have covered. This represents both techniques specific to regression, as well as foundational machine learning concepts that will appear throughout the specialization. We also briefly discuss some important regression techniques we did not cover in this course.<p> We conclude with an overview of what's in store for you in the rest of the specialization.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

The fifth module was then all about feature selection.

So, to motivate this, we talked about the fact that every house might have a really

long list of The attributes associated with it and for reasons of

both interpretability as well as efficiency in forming our predictions.

We want to select a sparse subset of these features to include in our model.

So to perform this feature selection the first thing that we talked about we a set

of methods that Explicitly searched over models with different numbers of features

and the exhaustive approach was something that's called all subsets selection.

But then we also talked about greedy procedures like forward selection and

saw that these gave, perhaps suboptimal solutions, but

we're much more efficient than the all subsets procedure.

But instead of explicitly searching over models with different sets of features,

we talked about how to use lasso regression to implicitly do this feature

selection where the objective looks just like but instead of using the L2 norm,

we're using L1 norm of our coefficients.

And we showed how that led to sparse solutions.

So in particular, if we look at the coefficient path associated with lasso

We saw that for any value of lambda, we ended up, typically,

with a sparse solution, getting sparser and sparser as we increase lambda.

And this was in contrast to what we saw for ridge,

where the coefficients just got smaller and smaller, here we actually

end up with the sparse solutions that lead to this idea of future selection.

Then to optimize this laws of objective, we talked about a coordinate descent

algorithm where we solved collection of one deoptimization

problems iterating through the different dimensions of our objectives.

So in particular the different features of our regression model.

And what we saw was that for lasso we ended up Setting our coefficients

according to something that we called soft thresholding,

where in a certain range of the correlation,

this correlation coefficient that we described in this module.

We're gonna set our coefficient exactly to zero.

And outside that range, relative to our least squares solution.

We're gonna shrink the value of the estimated coefficient.

So lasso can lead to these far solutions and has shown impact in just really,

really large set of different applied domains.

In our last module, we talked about a set of parametric techniques called nearest

neighbor and kernel regression.

And one nearest neighbor was a really,

really simple procedure, the most basic procedure that you would imagine doing.

But we show that it actually could perform really well,

especially when you have lots of data.

And what this method does is if you're going to estimate the value of your house,

you just look for the most similar house, look at its value, and

predict your value to be exactly the same.

Then we talked about making this a little bit more robust by looking at a set of

k-nearest neighbors and then say,

well you can also think about weighting these k-nearest neighbors when you're

going to compute your predicted value by how similar they are to you.

And then average across these ratings to form your estimated prediction.

And this led directly to an idea of kernel regression, where instead of

just waiting a collection of neighbors, you actually weighed every observation

in your data set, but a lot of the kernels that we specify actually set

those weights to zero outside a certain range and decay them within a given range.

And so what this leads to is an idea of these very local fits, and we talked

about how kernel regression was equivalent to forming these locally constant fits,

which was in contrast to our parametric models, that formed these global fits.

So here's a visualization of our kernel regression that we saw in this module, and

we see how it leads to these really, nice, smooth fits.

And these fits are very adaptive to the complexity of the data that we see, and

can increase in complexity as we get more and more data.

[MUSIC]