案例学习：预测房价

Loading...

来自 华盛顿大学 的课程

机器学习：回归

3517 评分

案例学习：预测房价

从本节课中

Feature Selection & Lasso

A fundamental machine learning task is to select amongst a set of features to include in a model. In this module, you will explore this idea in the context of multiple regression, and describe how such feature selection is important for both interpretability and efficiency of forming predictions. <p> To start, you will examine methods that search over an enumeration of models including different subsets of features. You will analyze both exhaustive search and greedy algorithms. Then, instead of an explicit enumeration, we turn to Lasso regression, which implicitly performs feature selection in a manner akin to ridge regression: A complex model is fit based on a measure of fit to the training data plus a measure of overfitting different than that used in ridge. This lasso method has had impact in numerous applied domains, and the ideas behind the method have fundamentally changed machine learning and statistics. You will also implement a coordinate descent algorithm for fitting a Lasso model. <p>Coordinate descent is another, general, optimization technique, which is useful in many areas of machine learning.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

So now let's talk about the complexity of this forward stepwise algorithm.

And contrast it with the all subsets procedure.

So how many models did we evaluate here?

Well, in the first step, assuming that we have a total of D features,

we searched over D models.

Then, in the second step, we had D-1 models to search over.

So all remaining features we might think about adding.

And then the third step, we had D-2 features and so on.

And a question is, how many steps did we take, how many of these and so

ons did we have?

Well, it depends.

It depends on when we chose to stop, but

at most we're gonna have D steps, which gets us to the full model.

So, the complexity of this is O(D squared).

So at most D steps each on the order of looking over D models.

And that's gonna be much, much less than 2 to the D,

which is what we had for all subsets, when D is reasonably large.

Okay, so this procedure is gonna be significantly

more computationally efficient or feasible than doing all subsets.

Well this was just one of many possible choices you have for

greedy algorithms for doing feature selection.

As an example, instead of always starting from an empty model and growing and

growing and growing, adding more features, you could do the opposite.

You could start from a full model and then choose,

at each iteration, which feature to eliminate.

But you could also think about combining these steps.

So you could do a forward procedure, where you search over which feature to add, but

then every so often you could go back and think about deleting features.

Because like we said, you might in the forward process, add a feature that later,

once you've added another feature, is no longer relevant.

So, as an example of this, maybe, and

it's not completely probably true in this application.

But maybe once you've added number of bedrooms and number of baths, maybe square

feet is no longer so relevant since these other things can act as a proxy for

square feet, or something like that.

And there are lots and lots of other variants of algorithms people have

proposed with different metrics for when you add or remove features.

And different ways to search over the features because this problem of feature

selection is really important.

It's not a new problem, and so

a lot of different procedures have been proposed to solve it.

[MUSIC]