案例学习：预测房价

Loading...

来自 University of Washington 的课程

机器学习：回归

3522 评分

案例学习：预测房价

从本节课中

Closing Remarks

In the conclusion of the course, we will recap what we have covered. This represents both techniques specific to regression, as well as foundational machine learning concepts that will appear throughout the specialization. We also briefly discuss some important regression techniques we did not cover in this course.<p> We conclude with an overview of what's in store for you in the rest of the specialization.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

Okay, so here's a summary of the large set of topics that we've covered in this

course.

We talked about a bunch of models including,

different models of linear regression from simple regression to multiple regression.

We talked about doing ridge regression and Lasso.

And then, Nearest neighbors and kernel regression.

And we also talked about some very important optimisation algorithms like

Gradient descent and Coordinate descent.

And really just this notion of what is optimization and

how do you go about doing it.

And then we talked about concepts that generalise well beyond regression.

This include things like Loss functions,

this very important concept of the bias variance trade-off.

Talking about cross-validation, sparsity, overfitting,

feature selection, model selection.

And these are ideas that we're going to see in most of the courses

in this specialization.

So, we spent a lot of time, teaching the methods of this module, and

now I've spent a lot of time summarizing what we learned, but

I want to take a minute to talk about what we didn't cover in this course.

So there are actually a few important topics that unfortunately,

we didn't have time to go through in this course, and I want to highlight them here.

One is the fact that in this course, we focus on just having a unit area output.

Which, for example, was the value of a house or the sales price of a house.

But of course you could have a multivariate output.

And in cases where that multivariate output, where the dimensions

are correlated, you need to do slightly more complicated things.

But in contrast if you assume that each of these outputs.

Are independent of each other, then you can just do

the methods we described independently for each dimension.

The other thing that we haven't covered yet,

is this idea of what's called maximum likelihood estimation.

We're gonna go through that in the classification course, but

I wanna mention that in the context of regression,

if you've heard of maximum likelihood estimation.

It results in exactly the same objective we had with our

minimizing our residual sum of squares.

Assuming that your model has what are called normal or Gaussian errors,

that this epsilon term that we've talked about.

Remember Y equals WX plus epsilon.

Well, that epsilon, if we assume it's normally distributed,

or sometimes people say Gaussian distributed, then maximum likelihood

estimation is exactly equivalent to what we've talked about in this course.

And like I said,

we'll learn more about maximum likelihood estimation in the classification course.

But one really, really important thing that we didn't talk about in this course,

which truthfully pains me, being a statistician, are statistical inferences.

We just focused on what are called these ideas of point estimation.

We just returned a W-hat value or estimated coefficients, but

we didn't talk about any notion of what our measure of uncertainty about those

estimated coefficients or our predictions.

So, again, there's noise inherent to the data so we can think of having

measures of uncertainty about our predictions or our estimated coefficients.

So this is referred to as inference and

it's a really important topic that we did not go through here.

Another cool set of methods are what are called generalized linear models and

we're actually gonna see an example of a generalized linear model in the class

vacation course, so you will get to see this but I want to bring it up here.

And what generalized linear models allow you to do,

is form regressions when you have certain restrictions on your output.

Like the output is always.

Positive or bounded or positive and bounded.

Or it's gonna be discreet value like were gonna talk

about in the classification course.

Just a yes or no response.

We'll, if we're assuming that Gaussian, our errors are Gaussian,

like they talk about with this maximum likelihood estimation or

in this course what we talked about of having zero mean but

the observations were equally likely to be above or below the true function and

they're actually unbounded in how far they could be above or below that true

function, well the regression models that we've talked about so

far are inappropriate for forming predictions if those predicted values have

these types of constraints or specific structures to them.

And generalized linear models allow us to cope with

certain types of these structures, very efficiently.

Another really powerful tool that we

didn't describe in this course is something called the Regression tree.

And that's because we're gonna cover it in the classification course.

Actually more generally, these methods are referred to as CART,

which are Classification And Regression Trees.

Because what you do is you form a tree.

And that structure's the same whether we're looking at classification or

regression.

But, we're gonna focusing on describing these structures in the context

of classification because they're a lot simpler to understand in that context.

But I wanna emphasize that those same tools that we're gonna

learn in the next course can be used in regression as well.

And of course, there are lots and

lots of other methods that we haven't described in this course.

Regression has an extremely long history in statistics, so

there are lots of things that are potentially of interest.

But in this course, we really try to focus in on the main concepts that are useful

in modern machine learning applications of regression.