0:45

And so, this is a vector y.

Now let's say expect the value of a particular value y naught

of a scalar version of y at the particular value x naught,

will that expected value is x naught transpose beta.

Okay?

And so our estimate of that clearly,

is going to be x naught transpose beta hat.

So we can create a confidence interval for

this prediction very easily using the tools that we've developed so

far because this is, again, just a linear contrast.

Okay, and of the betas which we've already even covered how to create a confidence

interval for that.

So we know that the variance of this, so

we'll call that Y hat naught, okay.

So the variance of Y hat naught is equal to X

naught transpose variance of beta hat X naught

which is equal to X naught transpose (X

transpose X) inverse X naught sigma squared.

2:25

Alpha over 2, and n-p degrees of freedom times s,

our residual variance estimate, which we're going to use for

that, times the square root of x naught transpose (x

transpose x) inverse x naught, okay?

So that is, say, if we have a linear regression

2:54

What that is, is a confidence interval for

the line at a given value of x naught, okay.

But that's not the entire story here about prediction intervals because

this talks about how well we've estimated the line, okay.

So if we think about our diamond prices for example,

it talks about how well we've estimated the average cost of a diamond for

that particular weight or that particular mass.

3:29

But if you're selling a diamond, you might be interested in knowing, okay not

if I collected all the diamonds of this particular mass and

took the average price that they were valued at.

Not that, but if I were to sell this particular diamond,

what's the range of possible values?

That would be reasonable as a price for this diamond and that's a different thing,

so there's a difference in this context between a confidence interval for

the mean value, in other words the value of the line or the plane or whatever,

at that particular collection of X values versus a prediction

that incorporates the uncertainty that is included in the Ys themselves, okay.

So imagine we want to predict Y naught,

which is the price of this diamond for this particular mass,

where we haven't actually observed the Y at this particular value of X naught.

Think of it as a new value of Y.

Well think about the the quantity y

naught- x naught beta-hat, okay?

That's the difference between our actual y naught at that particular

value of x naught, the new realized value of y and

what we would predict at this value of x naught, where, not beta naught, just beta.

Where again our beta-hat hasn't used this y naught in its calculation, okay?

So now the variance of

this is now the variance of y naught +

the variance of, let's say y naught hat.

Okay, and I can move that variance across that sum again,

because this beta hat didn't involve that y naught.

This potential new value of y naught in its calculations, so they're independent.

Well this variance of Y-naught is sigma squared plus the variance of Y-hat,

we just did that a second ago, that sigma squared, x-naught,

x transpose x inverse, x-naught.

There should be a transpose there, okay?

So if I wanted to estimate this variance it

would be sigma squared times 1 plus, x naught transpose.

X transpose x inverse x naught.

And then what I'm going to ask you to do for

homework because it should be old hat for you now,

is to prove to yourself that y naught minus x beta-hat over S square root,

1 plus X naught transpose, X transpose X,

inverse X naught, follows the T-distribution,

with N minus P degrees of freedom.

6:32

And so we can calculate the probability that

say a T quantile, the alpha over 2 t quantile,

with n minus t degrees of freedom n minus p degrees of freedom,

is less than or equal to y naught minus x.

Beta hat over S square root 1+x

naught transpose (x transpose

x) inverse x naught is less than or

equal to the t 1-alpha over 2 upper quantile.

That should be equal to 1-alpha, in other words,

we're looking at our t distribution, we're looking at the probability that it's,

7:26

If we put alpha over 2 there and alpha over 2 of the mass there.

The probability that our statistic lies in between those two cut offs

should be equal to 1 minus alpha.

And we can rearrange that to make the probability statement the probability that

y naught is in the interval, X naught beta-hat,

plus or minus t 1 minus alpha over 2,

s times square root 1 plus x naught transpose x, transpose X.

Inverse X naught.

9:15

Okay? So if we want a confidence interval for

the mean of the regression surface.

If we collect an infinite amount of data,

that confidence interval should get narrower and narrower.

It should always limit to the exactly the mean price.

So if we collected all the diamonds in the world of that specific weight, we should

have a very good estimate of what the line should be like at that particular point.

On the other hand, if we want to know what are the potential set of

prices we could get for this specific new diamond, that we're trying to sell.

Okay. There's some intrinsic variability that no

matter how well we estimate the line, would be surrounding the line.

Okay?

And that's why this one part doesn't go away no matter how much data,

how much better we estimate the line, it still is there.

It's sort of an intrinsic variability.

So as we collect an infinite amount of data, this part converges to sigma,

10:13

and this part will converge to 0.

And so that one part will just stay there.

And then just represent the natural variability around the line.

Okay?

So that's the distinction between a prediction interval and

a confidence interval.

So, a prediction interval, by the way, is not a confidence interval because

if you look at the actual probability statement we used,

the quantity we're saying is in the interval is a random quantity.

So it's not a confidence interval but we derive it in kind of the same way.

And now at this point in the class,

I think you should be able to derive the prediction interval and the confidence

interval that's why I'm going over this kind of glossing over it a little bit.

But I just want to make sure that everyone understands the distinction

between the two and why it is there's this fixed quantity in the prediction interval.

Okay, it's because of this natural variability that exists around

the regression line or we're ground the regression surface that doesn't go away if

what you want to estimate is what are the potential likely values for

a response at that given value of x, or that given collection of values of x.