案例学习：预测房价

Loading...

来自 University of Washington 的课程

机器学习：回归

3658 个评分

案例学习：预测房价

从本节课中

Simple Linear Regression

Our course starts from the most basic regression model: Just fitting a line to data. This simple model for forming predictions from a single, univariate feature of the data is appropriately called "simple linear regression".<p> In this module, we describe the high-level regression task and then specialize these concepts to the simple linear regression case. You will learn how to formulate a simple regression model and fit the model to data using both a closed-form solution as well as an iterative optimization algorithm called gradient descent. Based on this fitted function, you will interpret the estimated model parameters and form predictions. You will also analyze the sensitivity of your fit to outlying observations.<p> You will examine all of these concepts in the context of a case study of predicting house prices from the square feet of the house.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC] Okay, so

now that all of you are optimization experts we can think about applying these

optimization notions and

optimization algorithms that we described to our specific scenario of interest.

Which is searching over all possible lines and

finding the line that best fits our data.

So the first thing that's important to mention is the fact that our objective

is Convex.

So you can go and show that this is a convex function.

And what this is implies is that the solution

to this minimization problem is unique.

We know there's a unique minimum to this function.

And likewise,

we know that our gradient descent algorithm will converge to this minimum.

Gradient descent algorithm

will converge to the minimum.

Okay, so this is good news for trying to find the line that best fits

our data because this is a very, it sounds like a very complicated problem at least.

We have to search over all possible lines.

And, of course we couldn't possibly go and test each one of these lines, and

it's very, very helpful that there is this property, so

we can use these very straightforward algorithms that we described to go and

solve for the solution to this problem.

Okay, so let's return to the definition of our cost,

which is the residual sum of squares of our two parameters,

(wo,w1), which I've written again right here.

But before we get to taking the gradient, which is gonna be so crucial for

our gradient descent algorithm, let's just note the following fact about derivatives,

where if we take the derivative of the sum of functions over some

parameter, some variable w.

Then we can write this equivalently,

so this sigma notation, it's just shorthand for this.

Full sum over the N terms.

So N different functions, g1 to gN, the derivative distributes across the sum.

And we can rewrite this as the sum of the derivative, okay?

So the derivative of the sum of functions is the same as the sum of the derivative

of the individual functions so in our case, We have that gi(w).

The gi that I'm writing here is equal to (yi-[w0+w0+w1xi]) squared.

And we see that the residual

sum of squares is indeed

a sum over n different

functions of w0 and w1.

And so in our case when we're thinking about taking

the partial of the residual sum of squares.

With respect to, for example w0,

this is going to be equal to the sum, I equals 1 to N,

of the partial with respect to W0.

(Yi- [W0 + W1Xi])

squared, and the same holds for W1.

Okay, so now let's go ahead and actually compute this gradient.

And the first thing we're gonna do is take the derivative or

the partial with respect to the W0.

Okay, so I'm gonna use this fact that,

I showed on the previous slide to take the sum to the outside.

And then I'm gonna take the partial with respect to the inside.

So, I have a function raised to a power.

So i'm gonna bring that power down.

I'm gonna get a 2 here, rewrite the function.

That's W1 XI, and now the power is just gonna be 1 here.

But then, I have to take the derivative of the inside.

And so what's the derivative of the inside when I'm taking this

derivative with respect to W0?

Well, what I have is I have a -1 multiplying W0, and

everything else I'm just treating as a constant.

So, I need to multiply by -1.

So I'll just rewrite this.

I'm gonna multiply the 2 by this -1.

I'm gonna pull it outside the sum.

So I'm gonna get -2 sum i equals one to n.

Yi-w0+w1xi, promise I'll try and

minimize how many derivations we're doing in the slides like this.

But I think this is really instructive.

So, let's go ahead and now take the derivative or

the partial with respect to W1.

So in this case, again I'm pulling

the sum out same thing happens where I'm gonna bring the 2 down.

And I'm gonna rewrite the function here,

the inside part of the function, raise it just to the 1 power.

And then, when I take the derivative of this part, this inside part,

with respect to W1, What do I have?

Well all of these things are constants with respect to W1, but

what's multiplying W1?

I have a -xi so I'm going to need to multiply by -xi.

Okay, so again rewriting this.

I'm going to take the -2 to the outside of the sum.

And I'm gonna have yi-w0 +w1xi.

It's a good thing you guys can put me on two speed and

just get through all of this very quickly.

But here we're multiply by xi and we have this extra term

in the derivative with respect to w1 okay, well let's put this all together and look,

I went ahead and, I typed it out so, I don't need to write it here, that's good.

So this is the gradient of our residual sum of squares, and

it's a vector of two dimensions because we have two variables, w0 and w1.

Now what can w think about doing?

Well of course we can think about doing the gradient descent algorithm.

But let's hold off on that because what do we know is another way to solve for

the minimum of this function?

Well we know we can, just like we talked about in one D, taking the derivative and

setting it equal to zero, that was the first approach for

solving for the minimum.

Well here we can take the gradient and set it equal to zero.

[MUSIC]