0:01

So we call our diamond data set from before.

Here y was the price and x was the carat.

Consider now try to fit a line through this data where we have both

an interception of slopes.

So a two parameter progression setting.

0:14

And of course, we wouldn't wanna fit a line to the intercept in this case

cuz look at should go somewhere around 200 for the y point, so

we definitely don't want a line through the intercept.

We simply wanna find the best fitting like through the data like that.

So we wanna minimize y- beta nought

times Jn + Beta 1 times x.

Where Jn is the vector of ones so that is just one

1:11

But first before we discuss the two parameter problem,

let's talk about what we're doing relative to projections.

So imagine in this case why isn't three dimensional Y is however, many points that

is dimensional, 20 or something like that, but imagine if y was three dimensional.

We only have three data points, what would we be doing?

Well our vector wire outcome would be a point in three dimensional space, right?

1:39

And so our surface would be all the points that we want to project on

would be all the points that are of this form as beta not and beta one vary.

So that's a plane in three dimensions, right?

Because it's inherently two dimensional, right?

It only two parameters are varying linearly in that space.

The space that looks like this, the space gamma which is the collection

of Beta nought times J n plus Beta 1 times x for Beta nought, Beta one, in R2.

That space is of course two dimensional and it's linear so

it looks like plane in three dimensions.

So what we're trying to do is,

given our outcome y which is right here, we want to project it.

Onto this two dimensional plane and find the point, let's call that y-hat.

The point that lives in that two dimensional plane

that is closest to the observed data y.

The specific values of beta that multiply times Jn and x to give us that point

y hat, are going to be, we'll call those beta 0 hat and beta 1 hat.

So in this lecture, we're gonna talk about how you find beta 0 hat and beta 1 hat.

Another way to think about this problem, is to think in terms of the scatter plot,

where we have y and x.

Here we have a line of the form y=B0+B1x and our least

3:17

and finding the line that minimizes the sum of the squared vertical distances.

So that's another way to think about it.

So there's two ways to think about it.

One is to think about it as minimizing the vertical square distances

from the scatterplot and

another is to think about it as a projection in n-dimensional space.

Of course we can't visualize the n-dimensional projection,

we have to think of it,

pretend like it's a three-dimensional projection just for the illustration.

3:48

So, consider y minus beta 1 x

minus beta naught times Jn squared.

Imagine if beta one was fixed.

So let's just think of that as a single vector.

Okay?

Imagine if beta one was fixed, then the minimum of this equation over beta naught

is just going to be the average, because this is just regression with a constant.

It's going to be the average of this vector here.

Okay, so the average of that vector there is just Jn transpose 1 over n,

Jn transposed times y minus beta one x.

Okay, but because of transitively, this works out to be one over n,

Jn transposed times y, which is y bar.

4:38

And minus beta one which is scalar times one over n Jn transposed times x,

which is x bar.

So, we know that the minimizer has to go through the point y bar minus beta one

x bar, so that's our intercept,

beta nought hat as it depends on beta one has to be equal to that.

So let's plug that back into our equation.

So Y minus Beta one X, minus Beta one so

we are going to plug in Y bar minus Beta one X bar

times Jn, okay, squared.

So we know that this quantity right here, asterisk.

That asterisk, has to be greater than or equal to that thing.

5:26

Okay, so let's just do some reorganization.

Okay, so this works out to be y minus y bar times J N

minus, so that takes care of that term.

And the y bar times Jn and

then we need a minus Beta 1X minus

X bar, Forgot my Jn.

5:57

X bar times Jn.

Quantity squared, okay?

Now before, whenever we centered our random variables,

we were calling them y tilde, or x tilde.

So notice this is just the centered version of y, and

this is just the centered version of x.

So this is now = y tilde- beta 1 X tilda squared

where y tilda is simply y minus y bar times Jn and

x tilda is equal to x minus x bar times Jn.

Now, this equation is exactly regression through the origin

with the centered variables, which we talked about in a couple lectures ago.

So we know that beta one hat has to be equal to the inner product of y tilde and

x tilde over x tilde inner product of x tilde by itself.

And we also argued that that worked out to be the correlation between y and x,

the estimated correlation between y and x times the estimated standard deviation of

the y divided by the estimated standard deviation of the x.

So the regression, the best regression slope works

out to be exactly the same slope as if we centered the variables first and

then did regression to the origin.

7:23

And the intercept, if we needed intercept, so that's beta one hat,

and if we needed intercept, beta not hat is just y bar minus beta 1 hat X bar.

And so we've proven by plugging these in that we've gotten as small as you

possibly can in terms of the set of inequalities, so

these must be the minimizer of the least squares equation.