案例学习：预测房价

Loading...

来自 华盛顿大学 的课程

机器学习：回归

3449 评分

案例学习：预测房价

从本节课中

Feature Selection & Lasso

A fundamental machine learning task is to select amongst a set of features to include in a model. In this module, you will explore this idea in the context of multiple regression, and describe how such feature selection is important for both interpretability and efficiency of forming predictions. <p> To start, you will examine methods that search over an enumeration of models including different subsets of features. You will analyze both exhaustive search and greedy algorithms. Then, instead of an explicit enumeration, we turn to Lasso regression, which implicitly performs feature selection in a manner akin to ridge regression: A complex model is fit based on a measure of fit to the training data plus a measure of overfitting different than that used in ridge. This lasso method has had impact in numerous applied domains, and the ideas behind the method have fundamentally changed machine learning and statistics. You will also implement a coordinate descent algorithm for fitting a Lasso model. <p>Coordinate descent is another, general, optimization technique, which is useful in many areas of machine learning.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

But now let's go

through exactly the same geometric interpretation for our lasso objective.

And for our lasso objective,

we have our residual sum of squares plus lambda times rl1norm.

Okay so when we look at this first term which is in this pink or

a rather fuchsia box.

We have exactly the same Residual Sum of Squares that we talked about for

ridge so when we visualize the contours associated with

Residual Sum of Squares in our lasso objective it's exactly the same so

Residual Sum of Squares Contours For

lasso are exactly

the same as those for.

Ridge.

Okay, so I don't need to explain this plot again.

You remember what it is from our ridge visualization we just went through.

But now let's look at the term that's different.

There's looking at an L1 penalty instead of an L2 penalty and

here if we think of looking at the absolute value

of W0 plus the absolute value of W1, equal to some constant, defining

one of our level sets in our contour plot, well what does that look like?

That defines a diamond.

Okay. So, as I'm walking along this diamond,

every point along this surface,

here, this line that I'm drawing, has exactly the same L1 norm.

So, here I have my one norm.

Is equal to some constant 1.

Here I have the one norm equal to some constant

2 greater than constant 1 and so

on and again just to be very explicit if I look at some

W0 W

1 pair can look at any two points.

And sum W 0 prime, W 1 prime.

These points, or any points along this surface have the same.

Two, sorry, one norm.

That'd be one.

Okay, so if I'm just trying to minimize my l1 norm, what's the solution?

Well, again, just like in ridge,

the solution is to make the magnitude as small as possible Which is zero so

this is min over w of my one norm.

Okay.

So, this is a really important visualization for

the one norm and we're going to return to it in a couple slides.

But first what I wanna do is show exactly the same type of movie that we showed for

ridge objective but now for lasso so

again this is a movie where we're adding the two contour plots so adding

RSS + lambda W1 in this case so we're adding ellipses.

Plus some waiting Lambda of a set of diamonds.

And then we're gonna solve for the minimum, that's gonna be x.

So, x is again our optimal W hat for

a specific lambda.

And we're gonna

look at how that solution changes as we increase the value of lambda.

And again if we set lambda equal to zero

we're gonna be at our least square solution, so we're gonna start at exactly

the same point that we did in our ridge regression movie.

But now as we're increasing lambda,

the solution's gonna look very different than what it did for ridge regression.

We know that the final solution is gonna go towards zero, but

let's look at what the path looks like.

Okay, Vanna You're up, play the movie.

So, what we see is that the solution

eventually gets the point where w0 is exactly equal to 0.

So, if we watch this movie again,

we see that this X is moving along shrinking and

then it hits the Y axis and it moves along that Y axis.

So, the first thing that happens is W 0 becomes exactly 0 while the coefficients

shrink and at some point it hits the point where W 0 becomes exactly 0 and then

our W 1 term, the waiting on this second feature, H 1, is going to decrease and

decrease and decrease as we continue to increase out penalty term lambda.

So, it's going to continue to walk down this axis.

So, lets watch this one more time with this in mind.

Our solution hits that zero point,

that spar solution where W0 hat is equal to zero and

then it continues to shrink the coefficients to zero.

And you see that our contours become more and

more like the diamonds that are defined by that L1 norm.

As the weighting on that norm increases.

Now,let's go ahead and visualize what the lasso solution looks like.

And this is where we're gonna get our geometric intuition

beyond what was just shown in the movie for why lasso solutions are sparse.

So, we already saw in the movie that for

certain values of lambda, we're gonna get coefficients exactly equal to zero.

But now let's just look at just one value of lambda.

And here Is our solution, and what you see is that because

of this diamond, so let me write this as our solution,

Because of this diamond shape of our L1 objective or

the penalty that we're adding We're gonna have some

probability of hitting those corners of this diamond.

And at those corners we're gonna get sparse solutions.

So, like Carlos likes to say, it's like a ninja star

that's stabbing our RSS contours.

So, maybe that's a little Brutal of a description but maybe you'll remember it.

So, this is why lasso leads to sparse solutions.

And another thing I want to mention is this visualization

is just in two dimensions, but as we get to higher dimensions instead of

diamonds they're called Wrong boy and that they're very pointy objects.

So, in high dimensions were very likely to hit one of those

corners of this L1 penalty for any value of Lynda.

[MUSIC]