A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

来自 约翰霍普金斯大学 的课程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

207 评分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

从本节课中

Module 3A: Sampling Variability and Confidence Intervals

Understanding sampling variability is the key to defining the uncertainty in any given sample/samples based estimate from a single study. In this module, sampling variability is explicitly defined and explored through simulations. The resulting patterns from these simulations will give rise to a mathematical results that is the underpinning of all statistical interval estimation and inference: the central limit theorem. This result will used to create 95% confidence intervals for population means, proportions and rates from the results of a single random sample.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So this next set of lectures we will use the results from the Central

Limit Theorem regarding the theoretical sampling distribution

and our ability to estimate characteristics of this

sampling distribution from a single data sample

to create an interval that incorporates the uncertainty

in our sample based estimate as it

estimates some, the underlying, unknown, true population value.

Sometimes called a parameter, like a population

mean, or proportion.

And theses intervals that we'll be creating are called confidence intervals.

Okay, in this section, we'll begin our quest

to estimate confidence intervals for single population parameters.

From single samples from the aforementioned population by

focusing on confidence intervals for a population mean.

And that's what this section will be

about, estimating confidence intervals for population means.

So hopefully upon completion of this lecture section,

you'll be able to explain how the Central Limit

Theorem, aka the CLT, sets the groundwork for

computing a confidence interval for some unknown population parameter.

Be it a mean, proportion or incidence rate, using the

results from a single sample of data from the population.

And then how to estimate a 95% confidence

interval for a population mean based on the

results of the single sample from the population.

And how to estimate other levels of

confidence, other confidence intervals, 99%, 90% for example

for a population mean based on the results of a single sample from the population.

So let's just revisit the CLT, and it can never hurt to

go over this a couple more times, it's such a powerful result.

Recall the CLT states that if all possible random samples of the

same size, we'll call it N, were taken from the same population, and

a summary statistic were computed, either be it a mean, or proportion,

or incidence rate, whatever was appropriate for the type of data we had.

If the summary statistic were computed for each sample and then

the distribution of the summary statistic values across these samples was plotted,

let's say a histogram of all estimates across all samples of the same size.

Recall that this distribution, this

histogram would approximate a normal distribution.

This is my approximation by hand here.

That it would be well-described by a normal distribution, put a curve on top.

The center would be at whatever the truth

was, the true mean proportion or incidence rate.

So

what

do we know about this then?

If our estimates are distributed in a normal fashion around the truth,

sorry for that curved line, but it, but it gives a little bit of spontaneity to this

presentation, so we know that on a normal curve, most of the values

that fall under normal distribution fall within plus or minus two standard units.

We'll call them standard errors here to indicate that we're talking

about uncertainty or variability in sample estimates across multiple random samples.

So since most of them would fall above, within plus few

standard errors or below, minus two standard errors, of the truth.

So most of the samples we could get just by chance from a population of interest.

We'll have an estimate that falls within plus

or minus two standard errors of this unknown truth.

Only about 5% of the samples we could get in total, would

have an estimate that falls farther than plus or minus two standard

errors of the truth. So how that does help us though?

Because in research, we're only going to

take one sample from each population under study.

So how can this Central Limit Theorem help us in research.

Well, let's think about this for a minute.

We're only going to get one chance to estimate this

unknown truth, and we're only going to take one sample.

But most of the samples we could take, not all, but most, about 95% of

them, our estimate will fall within two standard

errors of this truth either above or below.

Now, we're going to take one sample.

We could just by chance get an estimate that falls way outside

the pack, something close to the middle, even closer to the middle,

something within two standard errors but relatively far away from the truth, etc.

We're never going to know where our estimate

lays under this theoretical curve because we

can't view or really draw this theoretical

curve perfectly because we don't know the truth.

But for 95% of the samples we could take,

if we start with our estimate

and put balance on in within plus or minus two standard errors of our estimate,

the resulting interval

will include our unknown truths.

So we can come up with a range of possibilities for the unknown truth.

Starting with our best guess, our sample based estimate, and

adding in or factoring in the uncertainty in our estimate,

brought about by the fact that we only have a

imperfect sub sample of the population who we want to quantify.

Now the rub with this that we saw in the

last section was, we don't know the true standard error.

However, we solve

in detail that this can be estimated, based on the results of a single sample.

So we're pretty much ready to go ahead and

build these intervals based on the results of single samples.

The kind of things we've been looking at in the previous lectures.

So here's one of our favorite data sets that we'll work with again this cl,

systolic blood pressure measurements from a random

sample of 113 men, adult men taken from

a clinical population.

So the sample mean, our best guessed, estimate for the, true population mean

blood pressure for this population from which the 113 men were sampled, is 123.6.

And that's our best working guess for the truth, or estimate for the truth.

But we know it's not necessarily a perfect

estimate, so we want to bring in the uncertainty.

Well, let's create this idea of a 95% confidence interval here.

We take our sample mean, plus or minus two estimated standard errors.

So we know, from previous lectures, we can estimate the standard error of a sample

mean as a function of the sample standard deviation that's 12.9, which estimates

the true variability in all such men, based on estimating it from the 113 we

have in our sample divided by the square root of 113, our sample size.

So, given the results from this sample

we can estimate a 95% confidence interval for the true mean blood pressure for all

men by taking our sample mean, adding and subtracting two estimated standard errors.

So here we're at a 113. And I'll let you verify the math on this,

but if you do this you take the mean and add and subtract, you get an interval that

goes from 121.18 millimeters of mercury, and

we could easily end properly, there'd be no reason not to round that

to 121.2 but, and then 126.02 millimeters

of mercury.

So, we'll delve into the interpretation of this in more detail

throughout the rest of this set of lectures here, in lecture seven.

But, this interval gives a range of possibilities for the

unknown true mean blood pressure, for all men in this population.

Is the true mean necessarily in this interval?

Not necessarily.

For 95% of the samples we

got this interval should include the truth but we'll never know whether we got one

of the 95% samples that were in, within that two standard error range or not.

And we'll discuss this in detail in the subsequent section.

But for now this quantifies an estimated range of

possibilities for the true mean that we can't directly observe.

Let's look at another example that we've

dealt with extensively so far in the course.

Length of stay data from the

Heritage Health claims.

These are the length of stay data for patients with at least one day in 2011.

This is each patient's cumulative length of stay for the entire

year for as many times as they were admitted to the hospital.

Okay and we remember that this was heavily right skew data where the mean was

4.3 days, and the standard deviation, the variation

in the individual sample values was 4.9 days.

So again, we'll use the same

approach to get a 95% confidence interval. We take our sample mean estimate

and add and subtract two estimated standard errors of the sample mean.

So with these data, we would do something like this.

4.3 days,

plus or minus 2, and our estimated standard

error will be that standard deviation of 4.9.

Individual variation estimated by the 12,298 claims in our

sample, divided by the square root of that sample size.

And again I'll let you do the math and verify what I've said, but this confidence

interval goes from 4.21 days, to 4.39 days.

Its rather tight and not, and narrower than our previous

interval because the sample size here was so much larger.

So this interval gives us a range of possibilities for the

true mean length of stay of between 4.21 days and 4.39 days.

Let's look at another example.

You might think, well, what good is the, what is

the utility of the single confidence interval for a single parameter.

And to start, it helps us quantify something about

an unknown quantity we want to get a hold on.

So it helps us give an interval to understand what's going on.

But this becomes especially interesting perhaps and useful when we start comparing

populations using some summary statistic as our quantification

of, of the distribution of values in that population.

So let me give you an example.

Here's a study from the New England Journal of Medicine in, intended to

look at the claims that low carbohydrate

diets were associated with greater weight loss.

That would have been

[INAUDIBLE]

and anecdotally in through books, and people

were jumping on things like Atkins Diet etc,

because of anecdotal evidence, and this is one

of the first studies to take this head-on.

So the study entitled, Low Carbohydrate as Compared

with a Low Fat Diet in Severe Obesity.

So what the researches did is they took patients that were clinically

obese, severely obese, and randomized them to one of two diet groups.

Either a low fat

or a low carb and the subjects were followed for a six month period and for

each subject in each group, they looked at the, after the

six months on the diet minus before when they started the diet the weight changed.

For each person in each group they computed the change in weight.

So some people extensively lost weight.

Maybe some people gained weight.

And then what they did, what they did at the end of the

study is they quantified the average weight change in the two diet groups.

So for the 64 people who were randomized in

the low-carb group, they lost on average 5.7 kilograms, but

there was a fair amount of variation in these individual

weight changes, and so the standard deviation of the 64

individual changes in weight was 8.6 kilograms.

They did the same thing for the low fat group.

There were 68 patients their subjects randomized to the low-fat group,

and the average weight change was a decrease of 1.8 kilograms.

So on average, the patients lost weight in this group as well.

There was a little less variability in these individual

weight changes than there were in the low-carb group.

So we might be thinking what

can we conclude about the efficacy, if you will, of low

carb versus low fat diets within, with regards to weight change.

Well, our sample results suggest that those in the low carb diet lost

more on average, on the order of almost four kilograms, more on average.

But that these estimates are based on 64 and 68 subjects,

respectively in areas especially on the low carb group, a

fair amount of individual variation in the weight change metrics.

So in order to better examine this and make our conclusions,

we first want to bring in the uncertainty in our estimated

mean weight changes in the two groups, before making a conclusion

about whether one group had better weight loss than the other.

So to start, what we could do is perhaps make confidence intervals for

the weight change in each of the two groups, the average weight change.

So if we do that for the low carb group, if we follow our

formula for the 95% confidence interval we take the observed mean of negative 5.7

kilograms and subtract two estimated standard errors,

which again is just our estimated standard

deviation of the individual weight changes divided

by the square root of the sample size.

And if you do out the math we see that the 95%

confidence interval for the average weight change for those in low

carb group was between negative 7.8 kilograms and negative 3.5 kilograms.

Well let's think about this for a minute.

After accounting for the uncertainty in our estimate, our best guess

or estimate for the weight change is negative 5.7 kilograms on average.

But there's some potential uncertainty and so, in 95,

95% confidence interval incorporates that uncertainty to

give us a true mean, or range for

the true mean weight change were we to

give all severely obese diet, patients this diet.

And so anywhere from negative 7.8 kilograms on average to negative 3.5.

Notice that all values in this interval are

negative, suggesting that after accounting, even after accounting

for the uncertainty in their estimate, there's some

evidence of a real weight loss on average,

with all possibilities for the true change are negative.

Let's do the same thing for the low fat group.

We create a confidence interval for the low

fat group by following the same formula approach.

We take the negative 1.8 plus or minus 2 estimated

standard errors, which again if you go back to that

table, is the standard deviation of the 68 individual weight

change measurements in the low fat group, divided by the square

root of the 68 subjects in the low fat group.

And when all the dust settles, the confidence interval

goes from negative 2.7 kilograms to negative .9 kilograms.

So, as with the low carb group, there’s some evidence that the weight change on

average was negative even after accounting for the uncertainty or estimates.

All the possibilities in this confidence interval

for the true mean change are negative.

So, it looks like on the whole, there’s evidence that both groups lost weight, and

this weight loss is real when we

account for the uncertainty error estimates on average.

However, if you look carefully at this, and we're going to get

into comparing populations head on, in the next set of lectures, but

this is the start, you'll notice that the confidence intervals for these

two groups if you were to plot them on a number line.

The confidence interval

for the two groups do not overlap. So what do I mean by that?

Here's zero, so if we put the end points for the low carb group, it would be

something like, and this is not drawn to scale, negative 3.5, negative 7.8.

So this is the confidence interval for the

low carb group. Do the same thing for the low fat group.

We get something that goes from about negative 2.7 to negative 0.9.

So both groups there's evidence of a mean

weight loss overall, but the amount of loss, on

average, is greater in the low carb group, even

after we've accounted for the uncertainty in these estimates.

So what we'll get to in the next section is

we'll call this difference statistically significant, meaning that even after

we've accounted for the role of chance in our estimates,

there's clear distinction between the

confidence intervals between the two groups.

Let's look at another example from the literature.

The Effects of Lower Targets for Blood Pressure and

LDL Cholesterol on Arteriosclerosis in Diabetes: The SANDS Randomized Trial.

And this is from the Journal of the American Medical Association in 2008.

And the objective of this study was

to compare the progression of subclinical arteriosclerosis in

adults with type two diabetes treated to

reach either aggressive targets of low density lipoprotein

cholesterol level of 70 milligrams or lower and a

systolic blood pressure of 115 milligrams of mercury or lower.

So that's one group, they were given these aggressive targets.

The other group was given the standard targets.

They were randomized and given the standard

targets of reaching low density pro, lipoprotein cholesterol

levels of 100 milligrams per deciliter or

lower and systolic blood pressure of 130 milliliters

of mercury or lower.

And so the researchers were interested to see

if these giving more aggressive targets would actually

end up resulting in better outcomes on these

measures for this group with the type two diabetes.

So, this was a randomized, open labeled, blinded to endpoint,

three year trial from April 2003 to Ju-, July 2007,

at four clinical centers in Oklahoma, Arizona, and South Dakota.

And the participants were 499 American Indian men and women, age 40 years or

older with type two diabetes and no, no prior cardiovascular events.

So the population under study here is American men, Indian men

and women, age 40 years or older with type two diabetes.

And what we have is a sample of 499 to work with.

And so the intervention is that participants were randomized to these

aggressive targets or standardized targets with

stepped treatment algorithms defined for both.

So what were the results of this?

I'll just pull a section from the results section here

with some quotes, so it's pulled directly from the paper.

It was referenced before. So the mean target LDL cholesterol

level and systolic blood pressures for both groups were reached

and maintained, so both of the groups met their targets.

But let's look at what they found.

The mean and with 95% confidence interval levels for LDL

cholesterol level in the 12, last 12 months were 72.

With the confidence interval 69 to 75, versus 104, with

the confidence interval of 101 to 106 for the aggressive group,

versus the standard.

So, this is for LDL.

And, then for the blood pressures, for the aggressive group, the mean blood

pressure at the end of the study was 117 with a confidence interval of 115 to 118

and for the standard group was 129 millimeters of

mercury, with a confidence interval of 120 to 130.

So what they're showing here is that while,

that while both groups met their targets on average,

these averages were lower for the aggressive group

on both outcomes, LDL and s, systolic blood pressure.

And even after counting for the uncertainty,

there's a difference in these averages

because the confidence intervals do not overlap.

And again we'll get to quantifying the difference

between two populations and really focusing on that measure.

But to start, this shows that the efforts for the effects of the aggressive

targeting resulted in lower average LDL and SBP measures than

compared to the standard group, and this

difference we solved couldn't be explained by

random sampling error alone. That these even after accounting for

uncertainty, there were clear distinctions between the mean results for both groups.

And if you actually look at this paper you can see that they present a lot of

95% confidence intervals for a lot of different

outcome measures, at different times between these two groups.

And they ultimately quantify these in

terms of differences in means between the two groups at the end of the study.

And that's where we're going, and that's what

we'll get to in the next set of lectures.

Here are the two that I pulled for the results section that we just discussed.

So one thing to think about.

So I keep talking about 95% confidence intervals, and these

are what we might call the industry standard in research.

This is what's generally presented in

journal articles and what's expected from researchers.

It is certainly possible, however, to

estimate intervals with different levels of confidence.

And for the same logic that we discussed before, just changing

the width of estimates that fall, how far they fall from

the truth.

If we only wanted a 90% confidence interval for a population mean, we could

start with our estimate, but only add and subtract 1.65 estimated standard errors.

If we wanted a 99% confidence interval, we'd actually need to go additional 0.5

standard errors in either direction above and beyond what we'd need for the 95%.

So you can see there is sort of a law of diminishing returns here

because of the bell shape of the normal

curve in order to get four percent more confidence

if you will, increase our chances that any

single sample yields an interval that includes the truth.

We need to, to increase that by four percent.

We need to add a total or increase our width of the interval

by .58 standard errors on either side. So, a hefty price to pay

in some sense for that extra four percent confidence.

So what we've seen in this section thus

far is we first reiterated the logic of the

Central Limit Theorem and how it helps us

when applying to the results of a single sample.

We found based on the logic from the

Central Limit Theorem, most of the sample estimates we

get for some unknown measure that quantifies something about

a population, be it a mean, proportion or incidence

rate will fall within two standard errors of the unknown truth.

So conversely, for most of the samples we could

take if we create an interval where we take

our estimate and add and subtract two standard errors,

upper estimate, the interval for 95% of the time.

This interval will include the unknown truth.

And we could change the level of confidence

if we wanted by adding more or less than two standard errors.

The other thing that Central Limit Theorem gave us is well, this

is all good and well, but we need to know the standard error.

The true standard error is something we can't have.

It's a theoretical population level estimate.

Or it's based on population level characteristics but we can estimate this

from our sample.

So for a sample mean, for example, we can take our estimate as the sample mean.

We can add and subtract two estimated standard errors of the sample mean.

Where this is estimated by taking

the standard deviation of the individual values

in our sample and dividing by the square root of the sample size.

So in the next section, we'll show the same sort of

results for proportions that summarize binary

data and incidents rates that summarize

timed event data and in the third section of this lecture set

we'll really delve into what, how do we interpret a confidence interval.

What does this result mean.