0:07

Lecture nine, on Confidence Intervals. In this lecture, we're going to talk about

confidence intervals mostly in the setting where we're going to assume that our data

come from a Gaussian distribution. So we'll talk about confidence intervals,

Confidence intervals for variance. We'll talk about Gosset's T distribution.

And we'll use Gosset's T distribution to create confidence intervals for means.

And we'll touch on the subject of profile likelihoods.

In the last lecture, we talked a little bit about the Central Limit Theorem and we

talked about the Central Limit Theorem to create a confidence interval.

I think in that example we created a confidence interval for a Binomial

Proportion. Now we'll discuss the creation of better

confidence intervals for small samples using Gosset's T Distribution.

Small samples where we're willing to treat the data as if it's continuous.

So to get to that point, And Gosset's T, t distribution is often

called student's T, t distribution and we'll explain why in a little bit.

So to discuss the T, t distribution, we first have to go through what the

Chi-squared2 distribution is. And so we'll develop that first.

And any rate, what you'll probably hopefully have noticed whenever we create

confidence intervals, there seems to be some kind of prevailing logic that we use.

Basically we try to create a probability statement.

And then we, in a sense, manipulate the probability statement to generate an

interval. Well, this strategy is codified here.

So basically, we create a pivot or a statistic that doesn't depend on the

parameter of interest. I should say we create a parameter or a

statistic whose distribution doesn't depend on the parameter of interest.

So, for example, if you use the central limit term; if you take a sample mean,

subtract off the population mean that you're interested in, and divide by the

standard error, well that statistic clearly depends on the parameter of

interest. But the distribution of that statistic, at

least in the limit, doesn't depend on the parameter that you're interested in, in

the sample mean. And then after we've created that pivot,

we solve the probability that the pivot lies between bounds for the parameter.

And so that's the kind of general strategy we'll go through.

You don't have to really know or really understand the strategy at a very general

level, but just in case you're wondering why does it always seem like we're

generating confidence intervals using basically exactly the same technique, it's

because we're employing the strategy kind of like this.

So let's talk about the Chi-squared distribution.

So remember the S^2 is the notation we have been using for the sample variance.

And let's further assume that the data that comprised the sample variance are all

IID normal with mean mu and variance sigma2.

Squared Well then, n - one times the sample variance divided by sigma2 squared

is a random variable that we call a Chi-squared2 distribution.

And the Chi-squared2 distribution has an index something that differentiates

between different kinds of Chi-squared2 distribution and we call that index the

degrees of freedom. So this statement right here will be read.

The normalized sample variance follows a Chi-squared2 distribution with n - one

degrees of freedom. So the Chi-squared2 distribution is the

skewed distribution in it. Of course since the sample variance has to

be positive it has support between zero and infinity.

And the mean of the Chi-squared2 distribution is its degrees of freedom.

And we can see that very directly. Because we recall the sample variance is

an unbiased estimator. That's why we divide by n - one instead of

n. So if you look at this equation, when you

take the expected value. The expected value of S^22 is sigma2.

Squared You can see that n minus expected value of n2 - one S^2 /2.

Sigma squared, the sigma2 squared will cancel out and you'll get the degrees of

freedom, or its expected value. The variance of the Chi-squared2, by the

way, is the, twice the degrees of freedom. As an aside, we're not actually going to

spend a lot of time doing this, but as an aside, you can use this idea to create a

confidence interval for the variance. So imagine if I were to draw a Chi-squared

density and Chi-squared2 n - one alpha is the alpha quantile from that distribution.

Then imagine taking the, say, alpha over two, you know?

Let's take alpha to be 0.05 for example. The 2.5th percentile and the 97.5th

percentile from the Chi-squared2 distribution and looking at the

probability that this Chi-squared2 random variable, n - one S^2 over2 sigma2 squared

is between those two quantiles. Well, that has to be one - alpha, just by

the definition of those being the 2.5th and 97.5th quantiles of the Chi-squared2

distribution. So this equality holds the equality that

one - alpha equals this probability. So this statistic, n - one S^2 over sigma

squared, is our pivot. Let's solve for the parameter that we're

interested in, sigma squared and you do that, keep track of your inequalities

being sure to flip them if you invert everything, and that sort of thing; and

you wind up with following probability statement.

There's a one - alpha probability that the random interval, n - one S^2 divided by

the upper quantile, and n - one S^2 divided by the lower quantile contains

sigma squared. So we call this interval, the n - one S^2

5:38

divided by the two quantiles, we call that interval a confidence interval for sigma

squared. And because the probability that the

random interval contains the parameter it's estimating is one - alpha, we call

it, say, a 100 times one minus alpha percent confidence interval.

So, as an example, alpha might be 0.05 and so you would then wind up with a 95%

confidence interval for the parameter sigma squared.

Now, we should talk a little bit about what this confidence interval means.

It's the interval that's random, in the paradigm that we're sort of thinking about

here. The interval is random.

And the parameter sigma squared is fixed. So when you actually collect data and you

form this confidence interval, it either contains sigma squared, which you don't

know, or not. There's no probability with that statement

anymore. It's either one or zero, it either

contains Sigma square or not. So what's the actual interpretation of a

confidence interval? Well if you take an Intro Stat class, they

make a lot of hay out of this point. And they basically say, okay, the

confidence interval is a procedure that if you were to repeatedly do the experiment

and form confidence intervals, 95% of the confidence intervals say that if you're

creating 95% confidence intervals. 95% of the confidence intervals would

contain the parameter that you're interested in.

And you could, as an example, do this in R.

You could generate normal data. You could, from a normal, let's say mu is

zero, from a normal zero sigma square distribution,

You could formulate this confidence interval from the sample variance, you

could check, whether or not that interval contained the sigma squared that we used

for simulation. And you can repeat that process over and

over, and over again. And you will find that about 95% of the

intervals that you get, if you construct 95% confidence intervals, will contain the

Sigma square that you used for simulation. And that's the logic behind confidence

intervals. And, they're, they're a little notoriously

hard to interpret, if you go for this sort of hardball interpretation.

They're notoriously hard to interpret. Kind of a, a much weaker interpretation of

the confidence interval that's a little less specific, is you get two numbers out.

And these two numbers are an interval estimate of the parameter that you want to

estimate but the interval estimate incorporates uncertainty.

So lets go through a couple of comments about this interval.

So one thing is this interval is not terribly robust, to departures from

normality. So, if your data is not normal, then this

confidence interval tends to not be that great.

Also, if you want a confidence interval for the standard deviation instead of the

confidence interval for the variance, You can just square root the n points of

the interval. The probability statement, one - alpha

equal to the probability that the random interval contains sigma squared.

Well you can still say that, that's one - alpha is equal to the probability of the

square root of the endpoints of the interval contains sigma and you haven't

mathematically change anything. So if you want an interval for sigma you

just square root the endpoints. So you might be wondering, okay, well if

this is heavily required normality do we have any other solutions other than this

interval if we want a confidence interval for the variance?

And it turns out the answer is yes, and several ways; but bootstrapping is kinda

the way that I prefer. But we're not going to talk about

bootstrapping in today's lecture. So today we're only going to take about

this confidence interval when you happen to be willing to stomach the assumption

that your data is exactly gaussian and you are willing to live with the consequence

that the interval you attained is not going to be terribly robust and departure

from that assumption. So the other thing I wanted to mention,

it's kind of a nifty little point, is suppose you wanted to create a likelihood

for sigma, and in this case the underlying data is Gaussian, with mean mu and

variance sigma squared. So it's hard because you have two

parameters. The likelihood is a bivariate function,

right? It has mu on one axis, sigma on the other

axis. And then the likelihood on the vertical

axis. So there's a little trick you can use to

create, I guess what I would call a marginal likelihood for sigma2.

Squared It turns out, and we're not gonna cover the mathematics behind this.

But that if you don't divide by sigma2, squared n - one S^22. And then don't

divide by sigma ^two. Well, first of all, that can't be

Chi-squared Let me just logic through that real quick.

That can't be Chi-squared because the Chi-squared density doesn't have any

units. Right?

So S^2 has whatever units the original data has.

Say it's in inches. It has inches squared units.

So you haven't divided by anything that's in the inches squared.

So n - one S^2 has inches squared units. And so it can't follow a distribution

that's unit-less like the Chi-squared distribution.

That's one of the reason why you have to remember to divide by sigma squared to get

the Chi-squared distribution to get rid of the units.

Let's suppose we don't divide by sigma squared.

Then you end up with a gamma. And a so-called gamma distribution, and

the gamma's indexed by two parameters, its shape parameter and its scale parameter.

In this case, the shape parameter is n - one / two and the scale parameter is two

sigma squared. And, either way, what you have is data,

You have a single number, n one, one S^2 and if you're willing to assume the data

points that comprise that number are Gaussian, then you can take the gamma

density and plug in the data and view it as a function of the parameters and plot a

likelihood function. So I'll go through an example of doing

this. So in our Organa Lead Manufacturing

Worker's example that we've looked at before, there was an average total brain

volume of 1,150 cubic centimeters with a standard deviation of 105.977.

And let's assume normality of the underlying measurements, which is not the

case, but let's do it. And let's calculate a confidence interval

for the population of variation in total brain volume.

I give the R code here, so I gave the standard deviations so our variances, you

know 105.106^2. Our n in this case as 513, confidence

interval, we want a 95% confidence interval so our alpha is 0.05. The

quantiles that we want we can just use the qchisq q function to grab those quantiles.

This function right here just grabs the two quantiles.

And then our interval is just n - twelve. S^2.

You know, the S^22 divided by the quantiles.

And then this puts it out from bigger to smaller.

I want it from smaller to bigger. So I use the RAV function to reverse it.

I think if I had just input my quantiles in the reverse direction, I would have

been okay too. And then, here, just take the square root

of that interval for an interval for the standard deviation.

And we get the interval is about 100 to 113.

So this interval, 100 to 113, is created in a way such that if the assumptions of

the interval are correct, namely that the underlying data are IID, normal, with a

fixed variance, sigma squared, and a fixed mean, mu.

Then the procedure, if repeated over and over again, 95 % of the intervals that we

obtain would be intervals containing the true standard deviation that we're trying

to estimate. Lets actually plot the likelihood as well

using this kind of likelihood trick that I gave.

So I, sigma valves is the sequence I want to plot.

And actually, I don't have to guess this because I just created this confidence

interval on the previous page that went from 100 to 113, so let's for good measure

go from 90 to 120. And I want to plot 1,000 points.

In R, you kind of have to pretty specific about the range that you want to plot and

how many points you want going into your plot.

And then I just give you the code here for evaluating the gamma likelihood.

It says basically, plug in the data n - one S^2. Right.

And remember the likelihood views that as fixed.

The shape doesn't involve anything other than things we know n - one / two.

And then the scale is the part that varies two sigma2.

Squared. And here, we're going to evaluate it over all the sigma vowels that I

assigned in the previous line. So this will evaluate that likelihood over

1000 points, and return a vector of length 1000.

I want to normalize my likelihood. And I'll just kind of, you know, mostly

approximately do that by taking this vector, and dividing by its maximum value.

And then I'll plot it, type = l means plot it as a line instead of as a bunch of

points and then these two lines commands adds the one eighth and one sixteenth

reference lines. And then on the next page you actually see

the marginal likelihood for sigma. That's a whirlwind tour of confidence

intervals and likely a clause for variances when you're willing to assume

your data is exactly Gaussian. I hesitate to say this, but kind of those

slides aren't exactly terribly useful material.

You won't find a lot of people plotting marginal likelihoods for sigma.

I just gave it to you cuz it's kind of a nifty little result.

And, to be honest, the Gaussian confidence interval for variances,

You don't see them as much. People just would tend to do bootstrapping

these days instead, or some other more robust technique.

So, this material, It's neat, and it's, the, the primary

thing to do was actually introduce the Chi-squared distribution.

So next, we're going to talk about something that's incredibly useful,

probably one of the single most used distributions and techniques in all of