A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

238 个评分

Johns Hopkins University

238 个评分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

从本节课中

Module 4A: Making Group Comparisons: The Hypothesis Testing Approach

Module 4A shows a complimentary approach to confidence intervals when comparing a summary measure between two populations via two samples; statistical hypothesis testing. This module will cover some of the most used statistical tests including the t-test for means, chi-squared test for proportions and log-rank test for time-to-event outcomes.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So when comparing two populations based on data samples, the results from

the central limit theorem allow for researchers to start with a sample data.

And use it to estimate both the population comparison measure, whether it be

a mean difference or a difference of proportions, a relative risk, et cetera.

And then estimate the uncertainty in this estimated measure.

With the confidence interval approach, these data could be

used to take the results from our samples to

the truth in the sense of starting

with our sample-based estimate and adding uncertainty bounds.

Another approach for connecting sample data to the underlying unknown population

truth is to start with some competing possibilities for the unknown truths.

And then use the sample results and the estimated uncertainty in

these results to choose between the competing options for the truth.

In this set of lectures, we're going to begin

our journey into talking about statistical hypothesis testing, and in this lecture

set we'll present the general concept and then look at specific hypothesis tests.

For comparing means between two populations.

So, in this first section, we're going to give

an overview of hypothesis testing for population comparisons.

So upon completion of this lecture section, you will begin to understand a

conceptual framework for the process of statistical hypothesis tests,

and how confidence intervals and hypothesis testing are related.

So, just to remind you of something we've stated

before frequently in public health, in medicine, and science, researchers

and practitioners are interested in comparing two or more outcomes

between two or more populations, using data collected from samples,

from these populations.

So it's not only important to estimate the

magnitude of the difference in the outcome of

interests between the two groups being compared, but

also to recognize the uncertainty, in the estimate.

So one approach to recognizing the uncertainty

in these estimates is to create confidence intervals.

And we did that in lecture set 8.

But a complimentary approach is called hypothesis testing.

And we can do hypothesis tests for comparing populations in

different scenarios that we discussed

with confidence intervals for comparing populations.

We can do the situation with continuous

outcomes for paired, and unpaired, population comparisons.

For binary outcomes we'll focus on the unpaired population comparison situation.

And, similarly for the time-to-event outcomes, we'll focus on the unpaired

population comparison.

And, the approaches for the unpaired studies can also be extended.

And, we'll look at this in lecture set 11 to

allow for more than two population comparisons with one test.

So just to remind you that it turns out that the

differences of two quantities whose distributions

are normal, have a normal distribution.

And we use that to create confidence intervals for the difference

in sample means, for the difference in population means.

For difference in population proportions, and to ultimately create a confidence

interval for things like relative risks, odds ratios, the incident freight ratios.

So we can extend the basic principles of the

central limit theorem to understand and quantify the sampling variability.

And we actually have on mean differences between two independent populations.

Differences in proportions between two independent populations and

the natural logs of ratios which are actually differences.

So just to remind you the logic of

a confidence interval we use the central limit

theorem that tells us, you know there's some

truth out there, for the population comparison measure.

That you're estimating.

There's some truth out there. So whether it be a true

mean difference, a true difference in proportions,

et cetera, there's some truth out there.

We don't know where it is, and we will never know exactly where it is.

But the Central Limit Theorem tells us that most

of the estimates for the truth we could get.

Would fall within a set range around that truth.

Within plus or minus two standard errors.

And we're going to do one study and get one estimate.

And we don't know where that estimate will fall relative to the truth,

but for most of the estimates we

could get, for a difference comparing populations.

Like a mean difference or a difference in proportions or the log of a ratio.

It will fall within two standard errors of our truth.

So, our strategy has been, to start with our estimate,

and go plus or minus two standard errors in either direction, in the hopes of.

Containing the truth in the interval we get.

So the idea of a confidence interval, is letting your data take you to

the truth by, adding in the uncertainty of the estimates based on the data.

Another possible approach however, for linking the sample results to

the unknown truth is, to start with some competing, but exhaustive.

And exclusive possibilities for the unknown

truth about the population comparison measure.

So start with possibilities for the truth.

And then use data from our sample to choose

between these two possibilities for the population level truth.

So, one possibility for the competing truths.

And this is actually one that we're going to work with from here on in.

Is, are the following, we'll call it Truth 1

to start.

That there's no difference, on the measure between

the two populations, no difference in means, for example.

Versus the very broad Truth 2, that

there is a difference between the two populations.

That the means are different, their difference is not zero, for example.

And these two possibilities can be phrased in terms

of the null values we discussed in lecture 8.

So for example, when comparing, means between two

populations, we could express these two competing truths.

In two ways.

We could say either the means are the same at the population

level, and if that's the case then the mean difference is zero.

And remember that's our null value,

for our mean difference.

And if the means are not the same between

the two populations, then the mean difference is not zero.

It is not equal to that null value.

So these two competing truths are called respectively, the

null hypothesis, sometimes represented by Hand a little o,

which is pronounced h not, and the alternative hypothesis,

sometimes represented by H and the subscript a, for alternative.

So the null was generally, there's no difference in whatever work

metric we're using to compare the populations, versus there is a difference.

And this, these can be expressed in several

equivalent ways for the data outcomes we have considered.

So for example, in the continuous situation, we've already

laid that out, whether our data are paired, or not.

The null hypothesis would be phrased in terms

of population means and the null would be

that the means are the same, versus the

alternative that they are different at the population level.

Of course,

these can be rephrased or represented,

in terms of the difference. In population means.

If the two means are equal, their difference is 0.

If they're not equal, their difference is not 0.

For

binary outcomes, we would just replace what we did with means,

for the population to indicate the proportion.

At the population level.

And there's many ways we can re-express this exact same message here.

We could do it in terms of the difference in proportions or the risk difference.

If the null is true, the risk difference would

be 0, versus the alternative where the risk difference would.

Not be equal

to 0.

We could do it in terms of the relative

risk, the nulls to the relative risk, is 1.

The alternative is true.

That the proportions are not equal to the relative risk is, not 1.

So you see these null values are popping up for the ratios as well.

And similarly for the odds ratio.

If the null is true, the underlying

proportions are the same, and if we construct

an odds ratio, comparing the two populations,

it'd be equal to 1, versus the alternative,

the odds ratio's not equal to 1. And of course.

We could also represent this on the log scale for the ratios.

Another way of saying the same thing is that the log, for

example, the relative risk is zero versus, that the, log is not zero.

All of these, representations.

Mean the same thing in terms of the null and alternative,

the proportions are either equal or not. With time to event, we can phrase

these, you know, the null is the incidence rate for two populations is the same

versus the alternative, is that it's not the same, that they're not equal.

We could rephrase that in terms of, the incidence rate ratio,

comparing the two populations,

mean 1 or not 1. Or the log.

Of the true incidence rate ratio being zero.

So,

these null and alternative can be represented in multiple ways.

Depending on the measure of association we choose to compare the populations.

So the question with hypothesis testing is we posit these two competing

truths and then we want to use the study data to choose between

what one of these two truths, while accounting for the uncertainty in the

study data, and again we'll appeal

to the theoretical sampling distribution to understand.

What we'd expect the variability of our

estimate to be, around some specified truth.

The variability from sampling variability.

So let's just go back and do this again with confidence intervals.

We use the sampling distribution and then exploit

the result that the truth is out there.

We don't know where it is.

That as I laid out before, that most of the

results we can get, most of the estimates comparing populations will

fall within, plus or minus two standard errors to the truth

so if we start our estimate for most of the time.

For most of the studies we get, start with our estimate and go.

Plus our two standard errors in either direction,

we'll get an interval that includes the truth.

Hypothesis testing starts in the opposite direction.

It says let's, let's start with an assumption about the truth.

Let's put the truth on the table.

Let's put a possibility for the truth at zero.

For the difference in the quantities we're comparing, whether it be means.

Proportions, or the log of ratios.

And then we know what the distribution of estimates of this truth

should vary, how they should vary around this assumed value of zero.

And then let's figure out where our result falls.

Where our result falls relative to all possible

results when the theta samples come from populations

with equal, measures. And what the hypothesis testing

do will plot where we fall relative to all other measures, where our estimate falls.

And then translate that into a probability

statement or proportion statement about the proportion or

probability of results that are as far or

farther then what we've observed in our study.

If the data, came from two populations with equal measures.

So we're going to look at how our results compare

to all results we could have gotten when the null

is true, and see whether our results are likely

or not, when the null is assumed to be true.

So what we'll ultimately get is called a p value,

which is going to measure, and we'll go through this carefully.

And repeatedly, it's going to give us a

measure of how likely our study results are.

So for example, how likely our sample mean difference would be, more results

even less likely, if the underlined populations that generated our samples.

Have the same means.

In other words, the true mean difference is zero.

So we'll get some measure of how likely our sample results are.

If the null is true and we'll use that

to make a decision about whether we'll stick with

the null, as a reasonable possibility for the truth

or to reject it in favor of that broad alternative.

So this is the first of many times that

the definition will be repeated, but soon you will see

it in the context of real study results, which

will help give this technical definition a specific, substantive interpretation.

The reason that I will repeatedly drill down on this definition

throughout lectures 9 through 11, and in other parts in the course.

Is because the p-value is often misinterpreted and given

more importance in the research process than perhaps is deserved.

The idea behind getting the p-value is very important, i.e.

The idea of distinguishing real differences

from sampling noise in the data.

But the p-value alone is only a piece of the process and

only one way to do this.

We've also done this sort of thing with confidence intervals.

So again, what a p-value measures is the probability of getting

a study result, a sample estimated difference, between the two samples

as extreme or more extreme as far, farther from the null

value by chance alone if the null hypothesis is the underlying population

level truth.

And what we're going to do is parlay this p-value

in to making a decision about the two competing hypotheses.

The p-value gives a probability.

There needs to be a rule about whether the p-value means the study results are

likely, or unlikely, if the null is actually the underlying population truth.

So the general cutoff, and we'll discuss this

in more detail throughout this lecture set,

is called the rejection level or alpha level.

Generally we use a cutoff in research of 0.05, but it doesn't have to be 0.05.

But we'll see 0.05 corresponds to a 95% confidence interval for the difference.

In population measures.

So, general if p is less than 0.05, the decision is made to reject

the null hypothesis in favor of the alternative.

In other words, our study results are unlikely, under the null assumption about

the truth, and the result is

called statistically significant at that 0.05 level.

If p is greater than or equal to 0.05,

the decision is to fail to reject the null hypothesis.

Sort of a double statement there. It's not a very strong

conclusion, bur we will see why this language is used shortly.

So let's think about this.

What is the relationship between the 95% confidence

interval, the appropriate null value and the p-value?

Well if our p-value is less than 0.05, and our sample results are less than 5%

likely, under the assumption of the null, than the 95% confidence interval

for the measure of interest, whether it be a mean difference, a

difference in proportions, a relative risk,

etc, will not include the null value.

If p is greater than or equal to 0.05 than the 95%

confidence interval for the measure of interest will include the null value.

And this is just to say that both of these approaches are using

the same information, to make the same general statement about the truth and

they should concur with regards to that general statement.

So we're going to spend more time fleshing out the details

of all of what we laid out here and we'll

show the mechanics for why the confidence intervals and p-values

will agree, for example, in terms of the null value.

And rejecting or failing to reject the null hypothesis.

But confidence intervals and hypothesis testing are two complimentary

ways of addressing uncertainty in sample based comparisons to

making statements about the unknown population comparisons.

Both methods operate on the principle that for

most random sample based studies, the sample results.

Should be close to the truth.

And we will deal with the mechanics of the synchronicity

between 95% confidence intervals and the significance level of 5%, shortly.

The confidence intervals approach starts with the study results.

And creates an interval around the study results to

create a range of possibilities for the unknown truth.

The hypothesis testing approach starts with an

assumption about the unknown truth, and then measures

how far the study results are from

this assumed truth and figures out whether they're.

Relatively likely or unlikely to it have occurred,

if this is the truth that generated the samples of data.

The end result's hypothesis testing is called a p-value.

The p-value quantifies how likely the study results are

or results even less likely if the samples being compared.

Came from populations with equal parameters of interest.

Equal population means, equal population proportions, et cetera.

So we'll spend a fair

amount of time fleshing these ideas out in both this lecture set and lecture set 10.

In general the mechanics of the test and the name of

the test employed depends on the type of data being compared.

And we'll go through several scenarios comparing continuous binary timed

event data in this lecture, and the subsequent lecture 10.

However, we'll see, and we'll really work on understanding the conceptual

foundation of all hypothesis tests is the same.

So, onward and upward, as we jump into the world of hypothesis testing.

So throughout this and lecture set 10 and 11,

we are going to look at different hypothesis tests

for comparing different data types between two, and ultimately

more than two, populations based on samples from the populations.

You'll be exposed to a lot of different test names for specific situations.

But the important thing to consider when you are learning about

these, is again, the variation on a theme idea was Statistics.

The underlying set-up and idea of these tests are

the same, and it's only the mechanics that will differ.