案例学习：预测房价

Loading...

来自 华盛顿大学 的课程

机器学习：回归

3441 评分

案例学习：预测房价

从本节课中

Welcome

Regression is one of the most important and broadly used machine learning and statistics tools out there. It allows you to make predictions from data by learning the relationship between features of your data and some observed, continuous-valued response. Regression is used in a massive number of applications ranging from predicting stock prices to understanding gene regulatory networks.<p>This introduction to the course provides you with an overview of the topics we will cover and the background knowledge and resources we assume you have.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

As one example of a way to handle this bias variance trade off, we're gonna talk

about something called ridge regression, which not only includes a term

that measures the fit of the function to the data, which is what we talked

about before, but also incorporates a term that encodes what the model complexity is.

Not quite directly, pretty indirectly as we're gonna describe in this module.

But a key question then is how are we gonna define the balance

between how much we emphasize the fit to data versus this model complexity term.

For this in ridge regression there's a parameter

that balances between these two terms.

And to define this parameter,

we're gonna discuss choosing it using something called cross validation.

And this again is a tool that's much more general than just for regression,

and it's an idea for how to choose these tuning parameters in

any machine learning model that we might look at.

Next we're gonna discuss a feature selection task.

So for example, I have my house that I wanna list for sale and I might have

a really, really long list of house attributes associated with this house.

And I wanna figure out which are those subset of attributes

that are really informative for assessing the value of my house.

So, for example, maybe it doesn't really matter the fact that my

house has a microwave when I'm going to predict the value of the house.

So, for reasons of interpretability,

it can be really useful to do this feature selection task.

And in addition, we're gonna show that if we have just a few set of

features in our model, after we've done this feature selection, then that can

lead to significant increases in efficiencies in forming our predictions.

And so, to do this feature selection task, the first thing we're gonna talk about is

ways to explicitly search between models, that include different sets of features.

But than we're gonna turn to a method that's really really similar in spirit to

ridge regression that allows us to do this feature selection task implicitly.

In particular, again we're gonna have this measure of fit of our function to our data

and a measure of the model complexity but

it's gonna be a different measure that what we use for ridge.

And this measure in particular is what's gonna lead to these,

what are called sparse solutions where only a few

of the features are actually present in our estimated model.

And we're gonna use this lasso regression task as an opportunity

to teach about another optimization method that's called coordinate descent.

So we talked about gradient descent earlier, and

this is another one of these really important optimization methods that we're

gonna see again later in this specialization.

And what coordinate descent does Is instead of solving a big,

high dimensional optimization objective, it's gonna go coordinate by coordinate.

So variable by variable, optimizing each in turn.

So we're gonna end up making these axis aligned moves

as we iterate in this algorithm.

So again, just like radiant descent, it's an iterative procedure.

But is fundamentally a different formulation for

how these iterates are defined.

Finally we're gonna conclude by discussing something called nearest neighbor

regression, which is a really simple, but very, very powerful technique.

So in the simplest case that we're gonna describe, if I'm interested in predicting

the value of my house, what I'm gonna do is I'm gonna go through my data set and

I'm gonna find the most similar house to mine.

Then, I'm simply gonna look at how much that house sold for and

I'm gonna say that's what I'm predicting my house sale's price to be.

Well you can generalize this idea of just looking at the most similar house to

looking at a set of similar houses and

then taking the average value of those houses as your prediction, but

what you can also do is something that's called kernel regression.

Where you actually include every observation in

your data set informing your predicted value.

But when you go to computer this average you're gonna weight the houses

by how close they are to you.

So houses that are very similar which are quote, unquote, nearby to you in the space

of similarity are gonna be weighted very heavily in this weighted average, and

houses that are very dissimilar are gonna be down weighted a lot.

And this leads to these really nice fits for regression and they're very adaptive.

As you get more data you can describe more and more complicated relationships.

So these methods are useful when you have lots of data, and

we're gonna discuss this data versus complexity trade off in this module.

So in summary, we're gonna cover a lot of ground in this course.

So we're gonna talk about all different kinds of models for regression, but

we're also gonna talk about very general purpose

optimization algorithms like gradient descent and coordinate descent, and

a whole bunch of concepts that are really foundational to machine learning.

Including things like the bias variance trade off, cross validation for

selecting tuning parameters, ideas of sparsity and over fitting and

how to do model selection and feature selection.

So, this is gonna be a really, really important course in our specialization.

[MUSIC]