案例学习：预测房价

Loading...

来自 华盛顿大学 的课程

机器学习：回归

3449 评分

案例学习：预测房价

从本节课中

Nearest Neighbors & Kernel Regression

Up to this point, we have focused on methods that fit parametric functions---like polynomials and hyperplanes---to the entire dataset. In this module, we instead turn our attention to a class of "nonparametric" methods. These methods allow the complexity of the model to increase as more data are observed, and result in fits that adapt locally to the observations. <p> We start by considering the simple and intuitive example of nonparametric methods, nearest neighbor regression: The prediction for a query point is based on the outputs of the most related observations in the training set. This approach is extremely simple, but can provide excellent predictions, especially for large datasets. You will deploy algorithms to search for the nearest neighbors and form predictions based on the discovered neighbors. Building on this idea, we turn to kernel regression. Instead of forming predictions based on a small set of neighboring observations, kernel regression uses all observations in the dataset, but the impact of these observations on the predicted value is weighted by their similarity to the query point. You will analyze the theoretical performance of these methods in the limit of infinite training data, and explore the scenarios in which these methods work well versus struggle. You will also implement these techniques and observe their practical behavior.

- Emily FoxAmazon Professor of Machine Learning

Statistics - Carlos GuestrinAmazon Professor of Machine Learning

Computer Science and Engineering

[MUSIC]

So we've talked about using k-NN for regression, but

these methods can also be very, very straightforwardly used for classification.

So this is a little warm-up for the next course in this specialization,

which is our classification course.

And let's start out by just recalling our classification task,

where we're gonna do this in the context of spam filtering.

Where we have some email as our input, and

the output is gonna be whether the email is spam or not spam.

And we're gonna make this decision based on the text of the email.

Maybe information about the sender, IP and things like this.

Well, what we can do is use k-NN for classification.

Visually we can think about just taking all of the emails that we have labeled as

spam or not spam and throwing them down in some space.

Where the distance between emails in this space

represents how similar the text or the sender IP information is.

All the inputs or the features we're using to represent these emails.

And then what we can do, is we get some query email that comes in.

So, that's this little gray email here.

And we're gonna say, is it spam or not spam?

There's a very intuitive way to do this, which is just search for

the nearest neighbors of this email.

The emails most similar to this email.

And then we're just gonna do a majority vote on

the nearest neighbors to decide whether this email is spam or not spam.

And what we see in this case, is that four of the neighbors are spam, and

only one neighbor is not spam, so we're gonna label this email as spam.

And so this is the really, really straightforward approach of using k-NN for

classification.

[MUSIC]