案例学习：预测房价

This is an excellent course. The presentation is clear, the graphs are very informative, the homework is well-structured and it does not beat around the bush with unnecessary theoretical tangents.

Loading...

案例学习：预测房价

Linear Regression, Ridge Regression, Lasso (Statistics), Regression Analysis

4.8（4,214 个评分）

- 5 stars3,427 ratings
- 4 stars667 ratings
- 3 stars70 ratings
- 2 stars18 ratings
- 1 star32 ratings

PH

Apr 07, 2016

This is an excellent course. The presentation is clear, the graphs are very informative, the homework is well-structured and it does not beat around the bush with unnecessary theoretical tangents.

FR

Jan 02, 2017

This course is great. Things are very clearly explained. I am particularly happy because it helped me to understand many mathematical concepts. I will try not to be scared about formulas anymore.

从本节课中

Simple Linear Regression

Our course starts from the most basic regression model: Just fitting a line to data. This simple model for forming predictions from a single, univariate feature of the data is appropriately called "simple linear regression".<p> In this module, we describe the high-level regression task and then specialize these concepts to the simple linear regression case. You will learn how to formulate a simple regression model and fit the model to data using both a closed-form solution as well as an iterative optimization algorithm called gradient descent. Based on this fitted function, you will interpret the estimated model parameters and form predictions. You will also analyze the sensitivity of your fit to outlying observations.<p> You will examine all of these concepts in the context of a case study of predicting house prices from the square feet of the house.

#### Emily Fox

Amazon Professor of Machine Learning#### Carlos Guestrin

Amazon Professor of Machine Learning

[MUSIC]

So in these algorithms, I said that there's a stepsize.

And you might be wondering, well, how do you choose that stepsize?

So, remember, the stepsize we denoted as eta.

And there are lots of different choices you can make here, and

clearly they're very important.

This determines how much you're changing

your W at every iteration of this algorithm.

One choice you can look at is something called fixed stepsize or constant

stepsize, where, as the name implies, you just set eta equal to some constant.

So for example, maybe 0.1.

But what can happen is that you can jump around quite a lot.

So let's say I start far away.

There's a really big radiant.

I take a very big step and I keep taking these big steps.

And then here, I end up jumping over the optimal to a point here and

then I jump back and then I'm going back and forth, and back and forth.

And I converge very slowly to the optimal itself.

And so an analogy here is you can think of a snowboarder in a half-pipe where they

just keeping ramping up, and back, and up, and back, and slowly it's decaying.

And slowly they're gonna just settle in to the groove of that half-pipe.

But you can imagine going back and forth and back and

forth quite a number of times.

But I should mention that this fixed stepsize setting

will work for a lot of cases we're going to look at in this module.

Because in most of the cases these functions are going to be something

that's called strongly convex.

That's just a really well behaved convex function.

It's very clearly convex.

It just never flattens out very much.

And in those settings, you will converge using this fixed stepsize,

it just might take a little while.

On the other hand, a common choice is to decrease

the stepsize as the number of iterations increase.

Okay? So this means that there's some stepsize,

is often called a schedule, so I can write that.

Or stepsize schedule that you need to set, and you need.

And common choices for this are that the stepsize you're using at iteration t,

is equal to some alpha over t or another common choice

is some alpha over square root of t.

And let's just plot what these functions look like.

Both of these functions look something like this.

So the stepsize, so this would be the plot of eta of t over iterations t.

So the stepsize is decreasing with the number of iterations.

So why does this make sense?

Well, when I start out, I'm typically assuming that I'm gonna be far or

decently far from the optimal, so I want to take large steps,

but as I'm running my algorithm, hoping I'm getting closer, and

I want to hone in more rapidly to the optimal.

So that's why I start decreasing the stepsize.

So that when I'm out here, initially maybe I still take a really large jump.

But then as I'm going in, maybe I'll jump across once,

but I'm gonna hone in much more rapidly to this minimum here.

But one thing you have to be careful about is not

decreasing the stepsize too rapidly.

Because if you're doing that, you're gonna, again, take a while to converge.

Because you're just gonna be taking very, very, very small steps.

Okay, so in summary choosing your stepsize is just a bit of an art.

But here we've gone through a couple of examples of things that are commonly

used out there.

Okay, so the other part of the algorithm that we left unspecified

is this part where we say, while not converged.

How are we going to assess our convergence?

Well, we know that the optimum occurs when the derivative of the function

is equal to 0.

But what we're gonna see in practice,

is that the derivative, it's gonna get smaller and smaller and

smaller and smaller and smaller, it's gonna get very, very, very, very, very,

very, very, very small, but it won't be exactly 0.

At some point, we're going to want to say, okay, that's good enough.

We're close enough to the optimum.

I'm gonna terminate this algorithm.

And I'm gonna say that this is my solution.

So what we're saying is that, what we're gonna need to specify

is something where we say, when the absolute value of the derivative,

I don't care if I'm a little bit to the right or

a little bit to the left of the optimum, but just what the absolute value is.

If that is less than sum epsilon.

This is a threshold I'm setting.

Then if this is satisfied, then I'm gonna terminate the algorithm and

return whatever solution I have Wt.

So in practice, we're just gonna choose epsilon to be very small.

And I wanna emphasize that what very small means depends on

the data that you're looking at, what the form of this function is.

What are the range of gradients we might expect?

Is the value of the function, a plot of the value of the function over iterations?

And you'll tend to see that the value decrease, and

it's basically not changing very much.

And at that point, you know that you've converged.

[MUSIC]