案例学习：预测房价

Loading...

来自 University of Washington 的课程

机器学习：回归

3651 个评分

案例学习：预测房价

从本节课中

Multiple Regression

The next step in moving beyond simple linear regression is to consider "multiple regression" where multiple features of the data are used to form predictions. <p> More specifically, in this module, you will learn how to build models of more complex relationship between a single variable (e.g., 'square feet') and the observed response (like 'house sales price'). This includes things like fitting a polynomial to your data, or capturing seasonal changes in the response value. You will also learn how to incorporate multiple input variables (e.g., 'square feet', '# bedrooms', '# bathrooms'). You will then be able to describe how all of these models can still be cast within the linear regression framework, but now using multiple "features". Within this multiple regression framework, you will fit models to data, interpret estimated coefficients, and form predictions. <p>Here, you will also implement a gradient descent algorithm for fitting a multiple regression model.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

Okay, so let's think a little bit about this form of this closed form solution and

what we see is we have this h transpose h inverse and

let's talk about that a little bit more.

Remember h was that big green matrix, it's the matrix of all the features for

each one of our observations.

So each row is a different observation and we have that matrix.

And we're pre multiplying by the transpose where we take it and set it on its side.

So, this inner part here is this green matrix on its side times the regular green

matrix and what's the result of that multiplication?

Well, remember how many rows are there to this matrix?

Well there are however many observations we have in our dataset,

which is N, that's how many rows there are.

And how many columns?

Well, it's however many features we're using.

And what's our notation for that?

That's just capital D.

Okay. So, if we multiply these two matrices so

in contrast when I take the transpose I have N columns and D rows.

And the result of multiplying a D by N matrix by an N by

D matrix is just a D by D matrix.

So, it's a square matrix that's D rows by D columns.

So, let me be a little bit more explicit.

It's number of features by number of features.

And then we need to take the inverse of this matrix.

So, that's gonna be invertible, this resulting matrix is gonna be invertible.

In general, so I'll say in most cases.

If the number of observations we have is larger than the number of features.

Okay that means that the this matrix is full rank and

then we can take its inverse.

If you don't know what full rank is that's perfectly fine for this course.

But if you do that's what we're referring to here.

And when I say in most cases is because there's a little caveat where

really it's just what we need is we need to make sure it's not just the number of

observations that we have that are greater than the number of features.

We need to make sure that the number of linearly

independent, Observations.

So, I should say really instead of capital N it's the number of linearly

independent observations that needs to be greater than the number of features.

And, again if that didn't make sense to you,

that's actually fine just think about the fact, and we'll talk about it a lot

in this course in later modules that this matrix might not be invertible.

Okay, so what's the complexity of the inverse though?

Let's assume that we can actually invert this matrix.

Well the complexity is often noted with this big O notation.

So I'm writing a big O, just the letter O number of features cubed, and

what that means is that the number of operations we have to do to invert

this matrix scales cubically with the number of features in our model.

Okay so if you have lots and lots and lots of features this can be really,

really, really computationally intensive to do.

So, computationally intensive that it might actually be

computationally impossible to do.

So, especially if we're looking at applications with lots and

lots of features, and again assuming we have more observations

still than these number of features, we're gonna wanna

use some other solution than forming this big matrix and taking its inverse.

Even though there are actually some really fancy ways of doing this matrix inverse,

and so know that those fancy ways exist, but still,

there are some very simple alternatives to this closed-form solution.

[MUSIC]