案例学习：预测房价

Loading...

来自 University of Washington 的课程

机器学习：回归

3658 个评分

案例学习：预测房价

从本节课中

Feature Selection & Lasso

A fundamental machine learning task is to select amongst a set of features to include in a model. In this module, you will explore this idea in the context of multiple regression, and describe how such feature selection is important for both interpretability and efficiency of forming predictions. <p> To start, you will examine methods that search over an enumeration of models including different subsets of features. You will analyze both exhaustive search and greedy algorithms. Then, instead of an explicit enumeration, we turn to Lasso regression, which implicitly performs feature selection in a manner akin to ridge regression: A complex model is fit based on a measure of fit to the training data plus a measure of overfitting different than that used in ridge. This lasso method has had impact in numerous applied domains, and the ideas behind the method have fundamentally changed machine learning and statistics. You will also implement a coordinate descent algorithm for fitting a Lasso model. <p>Coordinate descent is another, general, optimization technique, which is useful in many areas of machine learning.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

Now let's see why we get sparsity in our lasso solutions.

And, to do this, let's interpret the solution geometrically.

But first, to set the stage,

let's interpret the ridge regression solution geometrically.

And, then we'll get to the lasso.

Well, since visualizations are easier in 2D,

let's just look at an example where we have two features, h0 and h1.

So, let me just write this,

two features for

visualization sake.

And what I'm writing in this green box is my

ridge objective simplified just for having two features, and

in this pink box, I'm showing just the residual sum of squares term.

And what we're going to do to start with, we're gonna make a contour plot for

our residual sum of squares in these two dimensions,

w0 by w1, and let's look at this residual sum of squares term.

Where inside this sum over n observations we're

gonna get terms that look like y squared plus

w0 squared, h0 squared plus w1 squared,

h1 squared plus all the cross terms.

When I finish completing this square here, and if I think about this.

And if I sum over all my observations, these sums will pass in here.

So, if I think of this as a function of w0 and w1, well, what's this defining?

If this is equal to some constant, which is what a contour

plot is doing it's looking at an objective equal to different values.

Well this is an equation of an ellipse.

Because I have my two parameters, w0 and w1, each squared.

They're multiplied by some weighting and then there's some other terms having to do

with w0 and w1, no power greater than squared, that are coming in here,

setting it equal to constant.

That by definition is an ellipse.

Okay, so what I see is that, for

my residual sum of square's contour plot, I'm gonna get a series of ellipses.

So I'm highlighting here one ellipse.

And what this ellipse is is this is residual sum of squares of w0 w1

equal to what I'll call constant 1.

This next plot is residual sum of

squares w0 w1 = sum constant two,

which is greater than constant one and so on.

That's what all these curves are, increasing residual sum of squares.

And when I walk around this curve, it's a level set,

it's a set of all things being equal value.

So, if I look at sum w0 w1 pair and

I look at some other point, w0 prime, w1 prime,

while both of these points, here and here, have the same

residual sum of squares which is what I called constant one before.

So as I'm walking around this circle,

every solution w0 w1 has the exactly the same residule sum of squares.

Okay so hopefully what this plot is showing is now clear.

And now let's talk about, what if I just minimize residual sum of squares?

Well if I just minimize residual sum of squares,

everytime I jump from one of these curves to the next curve to the next curve,

all the way into the smallest curve, just this dot in the middle,

this is going to smaller and smaller residual sum of squares.

So this x here marks the minimum

over all possible w0 w1 of residual sum of squares

w0 w1 and what is that?

That's our lee square solution so this is w hat lee squares.

Okay, I'm gonna, because this is so important, I'm gonna highlight it in red.

That's what this point is.

I don't want to draw a circle cuz the circle is

exactly what all these other ellipses look like here.

So I'm gonna put a little box around this, and

I'll highlight in red this is w hat lee squares.

Okay, so that would be my solution if I were just minimizing

residual sum of squares, but when I'm looking at my

ridge objective, there's also this L2 penalty.

This sum of w0 squared + w1 squared.

And what does that look like?

So I have w0 squared + w1 squared and when I'm looking at my

contour plot I'm looking at setting this equal to some constant.

And changing the value of that constant to get these different contours,

the different colors that I'm seeing here.

Well, what shape is w0 squared plus w1 squared equal to constant?

That is exactly the equation of a circle.

So we see that the circle is centered about zero and

so each one of these curves here, just to be clear,

is my two norm of w squared,

equal to, let's say constant one.

This next circle is the two norm of w squared equal to sum constant two,

which is greater than constant one, and so on.

And again, if I look at any point w0 w1 and

some other point w0 prime w1 prime,

these things have the same norm of the w vector squared.

And that's true for all points around the circle.

So let's say I'm just trying to minimize my two norm.

What's the solution to that?

Well I'm going to jump down these contours to my minimum.

And my minimum, let me do it directly in red this time.

My minimum, oops, that didn't switch colors.

Directly in red.

My minimum is setting w0 and w1 equal to zero.

So this is min/w0 w1, my two norm,

which I'll write explicitly in this 2D case,

is w0 squared + w1 squared,

and the solution is 0.

Okay, so this would be our ridge regression

solution if lambda, the waiting on this 2 norm, were infinity.

We talked about that before.

[MUSIC]