案例学习：预测房价

Loading...

来自 华盛顿大学 的课程

机器学习：回归

3449 评分

案例学习：预测房价

从本节课中

Feature Selection & Lasso

A fundamental machine learning task is to select amongst a set of features to include in a model. In this module, you will explore this idea in the context of multiple regression, and describe how such feature selection is important for both interpretability and efficiency of forming predictions. <p> To start, you will examine methods that search over an enumeration of models including different subsets of features. You will analyze both exhaustive search and greedy algorithms. Then, instead of an explicit enumeration, we turn to Lasso regression, which implicitly performs feature selection in a manner akin to ridge regression: A complex model is fit based on a measure of fit to the training data plus a measure of overfitting different than that used in ridge. This lasso method has had impact in numerous applied domains, and the ideas behind the method have fundamentally changed machine learning and statistics. You will also implement a coordinate descent algorithm for fitting a Lasso model. <p>Coordinate descent is another, general, optimization technique, which is useful in many areas of machine learning.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

So finally, I just wanted to present the coordinate descent algorithm for

lasso if you don't normalize your features.

So this is the most generic form of the algorithm,

because of course it applies to normalized features as well.

But let's just remember our algorithm for our normalized features.

So, here it is now.

And relative to this,

the only changes we need to make are what's highlighted in these green boxes.

And what we see is that we need to precompute for each one of our features.

This term is Zj, and that's exactly equivalent to the normalizer that we

described when we normalized our features.

So if you don't normalize, you still have to compute this normalizer.

But we're gonna use it in a different way as we're going through this algorithm.

Where, when we go to compute roh j, we're looking at our unnormalized features.

And when we're forming our predictions, y hat sub i, so our prediction for

the ith observation, again, that prediction is using unnormalized features.

So there are two places in the rho j compuation where you would need to

change things for unnormalized features.

And then finally when we're setting w hat j according to the soft thresholding rule,

instead of just looking at roh j plus lambda over two,

or roh j minus lambda over two, or zero.

We're gonna divide each of these terms by z j, this normalizer.

Okay, so you see that it's fairly straight forward to implement this for

unnormalized features, but the intuition we provided was much clearer for

the case of normalized features.

[MUSIC]