0:01

So let's go through an example of calculating a prediction interval.

In this case, we're going to use the mtcars dataset, okay?

And so, I'm going to fit a model that has miles per gallon as the outcome.

Horsepower and weight and an intercept as predictors, there's my fit.

And if I do summary(fit) you can see, there it is, okay.

So if I want to predict in a new car, I'm going to create a new data frame.

And I want to predict that a horsepower of 90 and a weight of 2.2, okay?

So then if I do, predict(fit, newdata = newcar), okay,

it gives me my prediction, 25.8 miles per gallon.

Note, if I do predict of my linear model fit without any arguments, okay?

It predicts for all the existing x values because your x beta.

Basically, it gives you x beta hat, where x is the observed design matrix, okay?

So it gives you the yhat values from all the observed x values.

Okay, now, if I want to predict of this new data set but

I want a confidence interval.

For the prediction surface, the two dimensional prediction surface.

Then, I want to put confidence and it gives you the fit which is 25.8.

It gives you that lower and

upper confidence limits for a 95% confidence interval.

Okay, if I do interval = "prediction", then it does the same thing.

However, it's giving you a prediction interval.

So, notice, that the lower and

upper confidence intervals are the lower one is lower, and

the upper one is higher, representing that one plus that exists in there.

Okay, so let's now do this manually because in this class,

we like to know exactly what's going on under the surface.

Okay, so I'm going to grab dplyr.

1:58

Okay, and then, my y is my miles per gallon.

And my x matrix is the intercept, which is a 1.

And I'm just going to grab from the empty cars dataset the horsepower and weight.

And I like to do that with select, which is I think ,in the dplyr package.

Okay, my n is the number of observations I have.

And my p is the number of columns of x, which in this case p should be 3.

So 1 for the intercept, 1 for horsepower, and 1 for weight.

Okay, so x, transpose x inverse, is just x transpose x inverse.

And then, my beta is going to be x transpose x inverse times x transpose y.

2:39

And then, the new value I'd like to predict at is my intercept,

okay, 90 and 2.2.

Okay, 90 for the horsepower and 2.2 for the weight, okay, there we go.

And then, my yhat at that new x value, is going to be, x knot times beta.

But let me get to that in a minute.

My yhat and my observed x values is x beta, okay?

And then, my residuals are going to be y- yhat.

And then, my residual variance is just my average squared residuals

divided by n- p rather than n, okay?

Now, my prediction, my yhat0 at this new value of x,

is just going to be x knot times beta.

And I just did sum,

just to avoid having to type out the matrix multiplication operator.

Okay, so, what's my confidence interval?

It's yhat then + the + or- the 0.975 fifth quantile.

So instead of + or -, I just say, + the 0.25 and 0.975 fifth quantile.

because notice, if you do that, it's going to return the negative and

the positive version, okay?

And times s, then times square root x knot transpose,

x transpose x inverse x knot transpose.

So there it is, you need 24 to 27.2, 24.26.

So if we go up to our confidence interval, it's 24 to 27.2.

Okay, so it's the same thing.

Now, let's do our prediction interval.

It's the same thing, only now, there's this 1+ right here, okay?

So let's do that again.

And we get 20.356, 31.31.

Okay, so 20.356, 31.31.

Okay, so that's what's going on under the scenes.

It's pretty straightforward logic on how it's doing this.

And just take into account if you want to estimate

the prediction surface at a particular point, you want a confidence interval.

And if you want to evaluate the prediction surface plus the natural variability

that exists around that prediction surface.

Make sure you do a prediction interval rather than a confidence interval.

And it's all pretty easy with the predict function, okay.

You should only do these kind of calculations just as part of something

like this class.

Where you're just verifying that you understand how it works.

And then, from then on you would use the more natural function to do this.