案例学习：预测房价

Loading...

来自 华盛顿大学 的课程

机器学习：回归

3445 评分

案例学习：预测房价

从本节课中

Feature Selection & Lasso

A fundamental machine learning task is to select amongst a set of features to include in a model. In this module, you will explore this idea in the context of multiple regression, and describe how such feature selection is important for both interpretability and efficiency of forming predictions. <p> To start, you will examine methods that search over an enumeration of models including different subsets of features. You will analyze both exhaustive search and greedy algorithms. Then, instead of an explicit enumeration, we turn to Lasso regression, which implicitly performs feature selection in a manner akin to ridge regression: A complex model is fit based on a measure of fit to the training data plus a measure of overfitting different than that used in ridge. This lasso method has had impact in numerous applied domains, and the ideas behind the method have fundamentally changed machine learning and statistics. You will also implement a coordinate descent algorithm for fitting a Lasso model. <p>Coordinate descent is another, general, optimization technique, which is useful in many areas of machine learning.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

Well in our coordinate descent algorithm for lasso.

And actually all of our coordinate descent algorithms that we've presented we have

this line that says while not converged.

And the question is how are we assessing Convergence?

Well, when should I stop in coordinate descent?

In gradient descent,

remember we're looking at the magnitude of that gradient vector.

And stopping when the magnitude of the vector was below some tolerance epsilon.

Well here, we don't have these gradients we're computing, so

we have to do something else.

One thing we know, though, is that for convex objectives,

the steps that we're taking as we're going through this algorithm are gonna

become smaller and smaller and smaller as we're moving towards our optimum.

Well at least in strongly convex functions,

we know that we're converging to our optimal solution.

And so one thing that we can do is we can measure the size of these steps that we're

taking through a full cycle of our different coordinates.

Because I wanna emphasize,

we have to cycle through all of our coordinates, zero to d.

Before judging whether to stop, because it's possible that one coordinate or

a few coordinates might have small steps, but then you get to another coordinate,

and you still take a large step.

But if, over an entire sweep of all coordinates, if the maximum step

that you take in that entire cycle is less than your

tolerance epsilon, then that's one way you can assess that your algorithms converged.

I also wanna mention that this Coordinate descent algorithm is

just one of many possible ways of solving this lasso objective.

So classically,

lasso was solved using what's called lars least angle regression and shrinkage.

And that was popular up until roughly 2008 when

an older algorithm was kinda rediscovered and popularized.

Which is doing this coordinate descent approach for lasso.

But more recently there's been a lot a lot of activity in the area of coming up with

efficient parallel lines and distributed implementations of lasso solvers.

These include a parallel version of coordinate descent.

And other parallel learning approaches like parallel stochastic gradient descent

or thinking about this kind of distribute and

average approach that's fairly popular as well.

And one of the most popular approaches specifically for

lasso is something called, Alternating direction method of multipliers, or ADMM,

and that's been really popular within the community of people using lasso.

[MUSIC]