A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

240 个评分

Johns Hopkins University

240 个评分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

从本节课中

Module 3A: Sampling Variability and Confidence Intervals

Understanding sampling variability is the key to defining the uncertainty in any given sample/samples based estimate from a single study. In this module, sampling variability is explicitly defined and explored through simulations. The resulting patterns from these simulations will give rise to a mathematical results that is the underpinning of all statistical interval estimation and inference: the central limit theorem. This result will used to create 95% confidence intervals for population means, proportions and rates from the results of a single random sample.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in the last section we worked really hard to come up with

single number summaries both of individual samples and comparisons between

samples ostensively to quantify or give a best estimate

for some unknown underlying population quantity or comparison between populations.

But you may be thinking, John, you know,

I would feel more comfortable about some of

these depending on characteristics of the study.

For example, I would feel more comfortable and more confident about a comparison

based on hundreds of people in each of the groups

versus tens of people in each of the groups.

And I might push you and say,

well, why would you feel more comfortable?

And you'd probably end up saying something like,

estimates based on larger samples are more stable.

So I want you to think about what you might mean by more

stable and that's what we're going to investigate first in this module.

We're going to rigorously define something called

sampling variability or something that measures

the stability of a statistic based on

a single sample as an estimate of some underlying truth.

And we're going to look at sampling variability through the idea

that just by chance in our sampling process we

could get one of many different samples with

different elements all the same size from a population.

And understanding how our estimate across these samples,

these different samples we could have gotten by chance would vary,

gives us some insight as to how stable our sample statistic like a sample mean or

proportion or incidence rate is as an estimate of

the underlying true quantity at the population level.

And then we're going to show that, you know,

it may seem strange, well,

we're only in real life research going to take

one sample from each of the populations we're studying.

So how can we have an understanding of how our estimate would vary

across multiple samples given that we only have one sample?

But we're going to show a powerful mathematical result that will

pretty much tell us what would have happened if we had

taken multiple samples and allow us to structure that

and quantify it based on the results of single samples.

And so, we're going to build towards something called intervals, confidence intervals,

that allow us to take our best estimate of a quantity like a

mean or proportion from a single sample and then put uncertainty bounds on

it to come up with an interval that reflects our confidence about

our ability to estimate the underlying truth that we can't directly observe.

And this has become particularly critical when we start

comparing two or more populations through

two or more samples and we want to put uncertainty bounds on the differences

between those populations so we'll explore that in detail in this section as well.