A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

180 ratings

Johns Hopkins University

180 ratings

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

From the lesson

Module 3A: Sampling Variability and Confidence Intervals

Understanding sampling variability is the key to defining the uncertainty in any given sample/samples based estimate from a single study. In this module, sampling variability is explicitly defined and explored through simulations. The resulting patterns from these simulations will give rise to a mathematical results that is the underpinning of all statistical interval estimation and inference: the central limit theorem. This result will used to create 95% confidence intervals for population means, proportions and rates from the results of a single random sample.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So all of the confidence interval computations we've discussed so

far rely on the results from the central limit theorem.

And as it turns out, the central limit theorem

needs relatively large samples to kick in, to actually apply.

The thresholds for large actually depends on the data type being summarized.

And hence, our result through summary statistics.

There's different thresholds for large, for

means, versus proportions and incidence rates.

There's some modifications to this central limit

theorem based approach for smaller samples and

the type of modification performs depends on the type of data we're looking at.

Now we point this out so that when you see

reference of these in the literature you'll be aware of them.

This is the detail a computer takes care of

in the interpretation of the result in confidence intervals.

Is the same regardless of the sample sizes, that they are based on.

From this section, we'll just discuss

briefly how to handle confidence interval estimation

for population quantities when you have

small samples from the populations of interest.

And so, hopefully, after the conclusion of this lecture section.

You will appreciate and note the role of corrections to the CLT base methods.

That's the central limit theorem based methods.

When estimating a confidence interval

for a mean from a small data sample.

And similarly, I hopefully you'll appreciate and

note the role of exact, computer-based computations.

As an alternative to the central limit theorem-based methods when estimating a

confidence interval for a proportion or incidence rate from a small data sample.

So just to remind you once again of the

CLT, it's worth discussing this multiple times because it's

a pretty, critical concept to what we've been doing,

is that the CLT tells us the following generically speaking.

When taking a random sample of size n from a population.

The theoretical sampling distribution of a sample statistic.

Whether it be a mean, some rising continuous data, proportion summarizing

binary data, or an incidence rate summarizing time to event data.

The theoretical sampling distribution of a statistic across

all possible random samples of this same size.

Is, where the central limit theorem tells where

we to do take all possible random samples

from the same size, and do a histogram

of all estimates of the same underlying population quantity.

So all of our sample means, or all of our sample proportions.

This would be, approximately normally distributed.

And it would be centered at the true value of the thing

we're estimating, whether it be a

population mean, proportion, or incidence rate.

And the variation of estimates about this truth, this center of

the distribution, would be called the standard error of our sample statistic.

So, something I didn't talk about before, but I'm just going to note now,

is the central limit theorem requires quote unquote enough data to kick in.

For example, sample means a sample size cutoff often used for being able

to use the approach we've taken

for computing confidence intervals for population means.

The central limit theorem based approach x bar

plus or minus 2 estimated standard errors is 60.

And

we'll talk about where that comes from in a minute.

But why does the central theorem require a quote unquote larger sample?

Well, the sample needs to be large enough so that

the influence of any single sample value is relatively small.

So this logic about the influence of any single data point

on a sample statistic being large when we have small samples.

Applies to any of the summary measures we've discussed, means, proportions,

and incidence rates.

It also applies to the standard deviation estimates.

And although we didn't emphasize the standard deviation

estimates for binary data, or time event data.

We know that that plays a role in estimating the standard error.

So this influence on the standard

deviation influences our estimated standard error.

So we end up, we need to compensate both

for the lack of normality in the sampling distribution because

some samples will include outliers that throws the distribution of estimates

sort of out of the side of that nice normal pattern.

And because there's a lot of uncertainty

in the standard estimates used to create the

confidence intervals, so these two forces that play

upon us in smaller samples need to be.

Attended to with the methods we used to estimate confidence intervals.

How large is large enough varies depending on data type.

For continuous data with means, we've used, already

established this cutoff of 60 elements in our sample.

But, really that's not something I want you to worry about or keep track of.

The good news is the computers can correct.

Compute the correct 95%, and other levels if

you wish, confidence intervals, regardless of the sample size.

And most importantly, the interpretation of the confidence

interval is the same, regardless of how it's computed.

So let's just, hone in on the central limit

theorem again, when I'm talking specifically about sample means.

When taking a random sample of continuous measures of size n from a population

with some true mean of all values in the population, we'll call it mu.

And some true variability of all population values around that overall

mean, the theoretical sampling distribution of

all sample means from all possible

random samples of the same size n is.

We're just putting names to that generic representation before.

If we were to do a histogram of all these sample means

across all possible random sample of

size n, it would approximately normally distributed.

The center of the distribution, the average of

all the sample estimates, would be the true mean.

And the theoretical standard error or

variability in our sample means around the true mean would be

a function of the variation of individual values in the population, sigma.

That's population standard deviation, divided by the square root

of sample size, that each mean was based upon.

And technically this is true for "large n".

For this course, we'll say n>60.

But when an n is smaller, the sampling distribution is

not quite normal, but follows something called a t-distribution.

And a t-distribution looks like a normal curve but is symmetric and bell shaped.

And in fact if you saw a t-distribution walking down the street.

You'd probably think it was a normal curve unless

you saw it standing next to a true's normal curve.

Even if the distribution, sampling distribution

changes slightly with smaller samples, the theoretical

standard error, and hence the way we estimate

it from a single sample, will remain the same.

So the t-distribution is what we call

the fatter, flatter cousin of the normal curve.

And there's many t-distributions just like a

normal curve is uniquely defined by it's mean

and standard deviation, t-distributions are uniquely defined

by one quantity called the degrees of freedom.

And the smaller the degrees

of freedom, the wider the tails of the t-distribution.

So what we see here in this picture is, the blue shape on this slide

represents a true's normal curve, centered at mu with some standard error.

And on top of this, superimposed in dotted

lines are various t-curves with different degrees of freedom.

You can see they look very similar to the normal curve, and

they still have that same bell shaped symmetry around the center.

But you maybe be able to see, but it’s a little hard to tell us that.

Some of these curves will have longer

tails, in increasing the order of the dots.

And that represents smaller and smaller degrees of freedom.

So why do we need to correct for this, in smaller samples?

Well, we've talked about this idea

of the true standard error being a theoretical quantity.

The true standard error of a mean from sample size n

is a, combination of the variability of individual values in the population.

The true population standard deviation and the

sample size but, of course, we don't know.

This true standard deviation.

We don't know sigma, so what we've been doing

when we get a single sample is replacing that

with the sample standard deviation to estimate our standard error.

Well, if you think about it, in small

samples there's going to be a lot of sampling

variability in S, as well, where we to

take multiple random samples of the same size.

And look at the variability in s, as, as it estimates across the different samples.

There would be a lot variability, especially in small samples.

So in smaller samples this estimate is less precise.

And hence, there's added uncertainty when creating our confidence intervals

and how we estimate the standard error of our estimate.

So to account for this additional uncertainty, what we

have to do is compensate and go slightly more than.

Plus or minus 2 estimated standard errors, to get 95% coverage under the sampling

distribution. So how much bigger than 2 needs to be,

depends on the sample size.

And what, this is where the degrees of freedom will come into play.

The t-distribution that we would use to figure out how many standard errors

to add and subtract to our sample mean to get a 95% confidence interval.

The proper degrees of freedom is a function of the sample size.

And it would be the sample size less 1.

And I can describe in more detail.

What this quantity means and where it comes from

in one of the live talks if people were interested.

And again this can be looked up, the appropriate number

of standard errors to add and subtract in a t-table

and we'll show an example of one in a minute,

but in reality this is something that the computer will handle.

So, the basic idea is we have a smaller sample size, using 60 as our cutoff.

We'll have to go out more than two standard

errors to achieve 95% confidence.

How many standard errors we need to go out from the mean in either direction from

the sample mean depends on the degrees of

freedom, which we said was linked to sample size.

And so, one way of generically representing what we need to, in

[INAUDIBLE]

with smaller samples.

Is instead of, blindly adding and subtracting two standard estimated errors.

We need to look up the number of standard errors for our situation by figuring out

what value cutsoff 95% in the middle of a t-distribution.

With n minus 1 degrees of freedom.

So here's an example of a t-table just to show you what we're working with.

This particular t-table

gives the number of standard errors needed to cut off 95% under a t-distribution

with the listed degrees of freedom. So, where do we get 60 from?

Well, as our cutoff, well look, if we're dealing with 60

degrees of freedom which would be a sample size of 61.

Then the number of standard errors we need to use to cut off 95% is two, and that's

what we've been using as the central limit theorem-based approximation.

What really happens, and more formally is, as the degrees of freedom

gets larger and larger, this t-dibri,

distribution converges exactly to a normal distribution.

But we can,

say that they're pretty much identical when

the t-distribution reaches 60 degrees of freedom.

So what are the implications of this?

Let's just look at an example. Suppose I had a dataset with 16

[INAUDIBLE]

elements, a sample of 16 observations, so that n equals 16.

So my appropriate degrees of freedom would be 16 minus 1 or 15.

Well in order to compute a confidence interval for the population mean.

For the population from which this sample was taken.

What I need to do is take my sample

estimate, my estimated sample mean, and add and subtract

2.13 standard error.

So we need to go more than 2 to get 95% coverage.

And this method, when applied to samples of size 16.

We'll yield the interval that includes the unknown truth for

95% of the samples we could take randomly from our population.

You can see as that once we lose data and our

samples become smaller and smaller, we get increasingly more conservative about

how much we have to add to compensate.

And this might get to the point of

yielding intervals that are hard to interpret substantively.

So, even though we can do inference

and proper confidence interval creation in small samples.

The results may be so wide, the confidence interval may be

so wide, that it doesn't add anything to our knowledge substantively.

And we'll look at an example of that in this lecture set.

So here's the deal.

You can easily find a t-table for other cutoffs, 90% or 99% for

example, in any stats text at the back, or by searching the internet.

But in reality, hopefully, all your endeavours beyond this

class, you'll be getting your confidence intervals from a computer.

And I'm not going to try and trip you up and

have you look up things in t-tables to do means.

Anything I want you to do by hand will be for larger samples.

So the point is not to spend a lot of time looking up t-values, what's more important

is a basic understanding of why we need to add slightly more standard errors.

To the sample mean add and subtract slightly more standard errors

to the sample mean, in smaller samples to get a valid 95%

confidence interval.

And here's what's most important regardless

of how this confidence interval is

computed or that we have a large sample smaller the interpretation of it.

Is exactly the same.

It's, at the substantive level, it gives us a

range of possible values for the unknown, population mean.

And methodologically, this means that for 95% of the samples we could take.

Randomly from a population, and create this interval.

For 95% of the samples this interval, resulting interval,

will include the true mean between the ten points.

Let me just give you an example.

Here's the results from a small study on response to

treatment among 12 patients with

hyperlipidemia, with high LDL cholesterol level.

So they're given a treatment and what's measured

on each is their.

Cholesterol level at the beginning of the treatment

and then the cholesterol level at the end.

And for each patient they calculate the post treatment minus

the baseline or pre- treatment value to look at the change.

And amongst these 12 patients, there was an overall

decrease on average of negative 1.4 millimoles per liter.

But some people.

[INAUDIBLE]

Either there was variation in these changes.

Some people intensively went up.

The more went down the average was negative and the

standard deviation of these 12 changes was 0.55 millimoles per liter.

So if we wanted to con, to

calculate confidence interval for the true mean change.

Were we to give all patients with hyperlipidemia this treatment.

We'd actually, instead of, jumping and creating the sample mean

plus or minus two standard errors, we'd have to pay attention

to the fact that there's only 12 data points in our sample.

And if we were doing this to get

95% coverage, we'd have to appeal to a t-distribution.

With 11 degrees of freedom.

And if you were to go back to that table, if you were to use a computer to do this,

the proper number of standard errors on this t-distribution to get 95% coverage

is 2.2. So if we do the math here, negative 1.4.

Plus or minus 2.2 times the estimated standard error, we get interval

for the true mean change where we're to give the treatment to all.

Patients with hyperlipidemia that runs between negative 1.75

millimoles per liter, and negative 1.05 millimoles per liter.

So even after accounting for the uncertainty

in our sample and adding or subtracting more than two

po, two standard errors, because we have a small sample.

We get a relatively tight interval that only includes

negative possibilities, for the mean change of the population level.

So it suggests that where we to give this treatment to everyone in the population.

There would be a shift downward on average.

Because we don't have a comparison group here we can't necessarily

say this is because of the treatment but we'll get into that issue more

[UNKNOWN],

in more depth, in the next

[INAUDIBLE]

set of lectures.

For small samples of binary and time-to-event data, there is

no adjustment analogous to this

t-correction when creating confidence intervals.

We just need to appeal to the computer.

Exact methods need to be employed when creating

small sample intervals for proportions and incidence rates.

And these are handled by the computer.

traditionally these were only done with small

samples and the CLT results were used otherwise.

Even by

computers, the computer would take the estimated sample proportion and

add and subtract two estimated standard errors in larger samples.

And that's because the exact computations were really computationally intensive.

They are, really computationally intensive.

However, with today's computers, they can handle it at all levels.

So now computers generally universally record

the results of exact methods, but for

larger samples these will look exactly or nearly

identical to what you would create by hand.

Taking the estimate plus or minus two standard errors.

When we have large samples.

Here's the rub, the cutoff for small versus large with binary time-to-event

data is not so "cut and dry" as it is with continuous.

With continuous we could make this blanket rule of

60 and then it shouldn't really impact you in this class or

in real life anyway, because again, the computer's going to handle the detail.

But, it's harder to draw the line for

small versus large with time-to-event and binary data.

For binary data, it's not just a matter of how many observations we

have on our sample but its also how the yeses and nos split themselves.

And if we have.

Particularly imbalanced, where we have majority of yeses or majority

of nos in a small proportion of the other outcome.

Even if we have a large sample size n, our overall information amount might be small.

And in time-to-event data, the.

Sample size, working sample size, be large or not

is a function of both the number of events we observe and the total exposure time.

So that's just a FYI, for your information.

I'm not going to test you on whether we have a small sample or not.

Anything I'm going to ask you to do with the

central limit theorem base methods will be large sample.

But let me show you what can happen,

when we apply center limit based results to small samples with binary data.

So here's the example, suppose a random sample.

Of 20 MPH students was taken in February of 2013.

Students that time were asked if they currently had a cold.

Of the 20 students, three had cold symptoms.

What is the 95% confidence interval for the true prevalence or

proportion of colds among MPH students at this time?

So here are sample proportions three out of 20 or 15%.

So

what you would be tempted to do and I would argue that this was the appropriate

thing to do based on what you have learned is say hey we've got a sample proportion.

We can use the central limit theorem based approach.

Take our estimated proportion plus or minus two estimated standard

errors, and this is just that formula we've been using,

and create a confidence interval.

But if you do this, you actually get a confidence interval for the

true proportion of MPH students at Hopkins that have a cold that includes.

Negative values.

And we know a proportion can only be between 0

and 1, so this lower end point is a nonsensical result.

Suppose the proportion were large, like on the

order of 90% of the students had a cold.

Well, if we used this method

to create the confidence interval, we might

get an upper end point that exceeds 1.

So this is what can happen in smaller

samples when the central limit theorem doesn't kick in,

and in LiveTalk I'll give a little more

insight as to why it doesn't quite kick in.

Especially when we have small or large sample proportions and not much data.

But if we went to the computer.

It would actually do the exact computations

for us.

And if we do that, we get an interval from 0.03 to 0.38.

0.38, my apologies, are 3% to 38%.

So you can notice that the lower end point is legal.

It's valid. It's greater than 1.

And it differs from the negative 0.01 we got before and even the

upper end point is a lot larger than the 31% we got here.

Maybe the biggest take home message of this is, regardless of

whether you use the CLT based method and got something technically improper.

What the exact method is, when you have

a sample this small, our ability to quantify.

The prevalence of the colds in this population is poor.

If our confidence interval is from 3% to 38%,

we really haven't been able to adequately describe the

burden of colds in the MPH population in 2013 with any level of precision.

If the true proportion could be between 3% and 38%.

Let's look at an example of small sample of time-to-event data.

Here's a pilot study.

It was performed to evaluate the effect of

a new therapy on allowing cigarette smokers to quit.

So 20 smokers were enrolled and followed up for

up to one month after the start of the therapy.

Three in the sample quit smoking in the follow-up period,

and there was a total of 313 days of follow-up.

So, if we were to compute the incidence ratio as we've done

before, we would take the three events we saw, the three people quitting.

Divided by the total followup time. So it would give us

an incidence rate of 0.0096 events per

day, quittings if you will per day. So if we

actually went and did the central limit theorem basement that took

our estimate, and add or subtract the estimated stated errors.

We get a 95% confidence interval of negative 0.0014 to negative 0 to

0.021 events per day. So you can see the lower end point

here is implausible for the truth because incidence rates can not be negative.

And this is because we have a small sample, of time-to-event data.

An our, the central limit theorem does not quite kick

in, and allow us to create a valid 95% confidence interval.

If we did it with a computer,

we'd actually get an interval that goes from 0.002 to 0.028 events per day.

And we could scale this any way we wanted.

Events per year, events per month etcetera.

So, the big picture here is with small samples, adjustments need to be made

to the CLT based approaches to estimating

confidence intervals for means, proportions and incidence rates.

And computers can handle these computations, so

there's no need to learn how to look

things up in a t-table or do this, try and do these exact computations by hand.

I just wanted to make you aware of this in

case you're reading a paper where they give some summary statistics.

For example, the mean, standard deviation, sample size

for something, and then they give a confidence interval.

And you compute a confidence interval the way you've been taught in this

class, and you notice it doesn't correspond exactly to their results.

[UNKNOWN],

in fact the endpoints are off by a

fair amount and or don't make substantive sense.

And that may be a case if, they've used

the exact method or if they're recording the results

from the computer and it differs from the central

limit theorem based approach because the sample is small.

I just wanted to make you aware of this.

Most importantly the interpretation of the resulting

confidence intervals, use exactly the same regardless

of how the confidence interval is constructed.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.