案例学习：预测房价

Loading...

来自 华盛顿大学 的课程

机器学习：回归

3442 评分

案例学习：预测房价

从本节课中

Multiple Regression

The next step in moving beyond simple linear regression is to consider "multiple regression" where multiple features of the data are used to form predictions. <p> More specifically, in this module, you will learn how to build models of more complex relationship between a single variable (e.g., 'square feet') and the observed response (like 'house sales price'). This includes things like fitting a polynomial to your data, or capturing seasonal changes in the response value. You will also learn how to incorporate multiple input variables (e.g., 'square feet', '# bedrooms', '# bathrooms'). You will then be able to describe how all of these models can still be cast within the linear regression framework, but now using multiple "features". Within this multiple regression framework, you will fit models to data, interpret estimated coefficients, and form predictions. <p>Here, you will also implement a gradient descent algorithm for fitting a multiple regression model.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

So just like we did in simple linear regression,

we can think about interpreting the coefficients of our fitted function.

But first let's remember what we did for the simple linear regression model,

where when we looked at our estimated function, our fitted function.

It was just some fitted line.

We had a w hat zero and a w hat one.

Those were our estimated parameters of this model.

Well, we said w hat zero,

that was just, it had an interpretation in terms of the intercept of this line.

So, the value of a house with no square feet, let's just ignore that one,

that one's gonna have the same interpretation throughout

all of these models that we're talking about.

But let's talk about w hat 1.

Okay, so in our simple linear regression model what we said was w hat 1

was the predicted change in our output per unit change in our input.

So, predicted change in the house value for

one square foot change in the house that we're looking at.

Okay.

So now let's think about,

how do we interpret the coefficients if we have two inputs, instead of just one?

Well in this case, what we can do is,

we can think about just fixing the other input.

So let's fix our first input, the number of square feet of the house.

And then think about what's the effect of the number of bathrooms.

And if we fix the number of square feet, what that's equivalent to is we're taking

a slice through this space for however many square feet we're looking at.

And when we take a slice though this hyperplane, so there's this hyperplane.

Let me see if I can draw this.

There's this hyperplane living here in 3D space we take a slice, and

once you slice that hyper plane, you just get a line.

So, that's this line here, for fixed number of square feet.

And, then when we think about the interpretation of w hat 2,

the coefficient on the number of bathrooms in the house,

what that's saying is, well it's the slope of this line, so

that means for a single unit change in our input, which here is number bathroom.

So, adding one bathroom, what's the predicted change in the output?

The value of the house?

So just to be very clear, we're taking a house of a fixed size and

we're adding a bathroom, maybe we're changing some room to be a bathroom or

knocking something down, putting in a bathroom.

And we're saying,

how much does that increase the value of my house to make that change?

Not changing the size of the house, that's gonna stay the same, but

adding a bathroom to the number of counts of bathrooms in this house?

So, that's the interpretation of w hat 2.

Well, what happens when we go to a more general model,

with some little d different inputs.

Well, in this case, when we think about interpreting any given coefficient

of our fitted model, we're gonna fix all the other inputs in the model, and

we're just gonna look at that one that we can vary.

So, for example, number of bedrooms, okay?

And so when we're looking at the coefficient associated with number of

bedrooms, the interpretation is what's the predicted change in the value of my house

if I change just adding one bedroom to the house, with everything else fixed?

So when you think about the interpretation of the coefficient you have to be very

careful that the interpretation is in the context of what you have in the model.

So let me give you an example where if I think about

the coefficient on the number of bedrooms.

If I have square footage of the house in my model,

well what does it mean that I increase the number of bedrooms?

What I've had to do is make some other room smaller.

Let's say I am always working with manipulating the existing bedrooms.

So the house currently has three bedrooms.

And I'm going to modify it to have four bedrooms.

Well what that probably means for the house, is that that house has four smaller

bedrooms than a house of the same size with three bedrooms.

So for a lot of people, that might not actually add value,

that might decrease the value.

To have four small bedrooms might have less value than three nicer,

more grand bedrooms.

Depends on the person or family of course but you can end up with a negative

coefficient on the influence of the number of bedrooms to the value of the house.

Because what it's saying is that as you keep adding,

imagine adding all the way up to ten bedrooms,

of course the impact on the value of the house can go down, if you're not able

to increase the size of the house, if you're fixing the size of the house.

But what if you remove square feet from the model, and you refit this model.

Well, let's think about the coefficient on number of bedrooms.

In this case, it's most likely going to be some positive number

because number of bedrooms is indicative of the size of the house.

We can use that as a proxy when we don't have square feet in the model itself.

So if I think of reading some listing with a house with ten bedrooms,

probably I don't imagine them to be these tiny tiny little bedrooms,

I just imagine a really big house.

So I imagine the value of that house would be quite large compared to a house with

one or two bedrooms.

So what I'm saying is that thinking about the coefficient, this maybe positive,

big positive number, when I don't have square feet versus the coefficient

associated with number of bedrooms if I include square feet being negative.

I can have totally different interpretations unless I think of

the interpretation as I should in the context of what's included in the model.

And I'm saying this again and again and again,

because it's a really common mistake to just look at the coefficient and

say, oh, as I increase the number of bedrooms the value goes down.

No, think about the coefficient and the context of what you've put into the model.

Okay, now I think I've said that enough times that you guys will remember that.

Okay, so now let's try to interpret the coefficient

in a polynomial regression model.

And here I'm just assuming that we have one input, like number of square feet, and

our features are different powers of this input.

So what would be the interpretation of the jth coefficient?

Well I have to think about fixing everything else in the model, but

if everything else is a power of this one input that I'm changing, I can't do that.

So here, unfortunately, we can't think about interpreting the coefficients in

the way that we did in the simple hyperplane example of the previous slide.

So, just a little word of warning that if you're ever in a situation where you can't

hold all your other features fixed,

then you can't think about interpreting the coefficient.

[MUSIC]