A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

238 个评分

Johns Hopkins University

238 个评分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

从本节课中

Module 4B: Making Group Comparisons: The Hypothesis Testing Approach

Module 4B extends the hypothesis tests for two populations comparisons to "omnibus" tests for comparing means, proportions or incidence rates between more than two populations with one test

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Okay, so now continuing with our theme of

hypothesis testing for comparing more than two populations.

We'll look at the results for comparing survival curves between more than two

populations using log rank test which we introduced in the last lecture set.

So in this lecture set you will learn to interpret a p-value for

a hypothesis test comparing survival curves,

an hence incidence rates, between more than

two populations with one test.

An the method for getting the p-value is an extension to the log rank

tests we showed in lecture 10, and is also called a log rank test.

So this is a family of tests that allows for the comparison.

A population level survivor curves for two or

more than two populations based on sample data.

So let's go back to our maternal vitamin supplementation infant mortality study

the one where expectant mothers were randomized in Nepal to receive

either vitamin A, beta carotine or a placebo in the prenatal period.

And we've already seen with summary measures and confidence intervals,

no strong association, no association

at all really between vitamin supplementation.

And increased or reduced mortality but here are

the estimated incidence rate ratios and the follow-up

period comparing children growing to mothers given vitamin A.

Compared to mothers given placebo, they had

a slightly elevated risk of mortality although

we had seen with the confidence interval

section that this was not statistically significant.

And, the incidence rate estimates were equal for the beta

kerotene and placebo groups, for an incident rate ratio of one.

Further we've looked at the Kaplan-Meier curves for these three groups.

And even when we actually scaled the vertical axis to include a

limited range of values we saw very little visual distinction between these curves.

And of course we could also present

it in the opposite complimentary approach where we

take one minus the proportion surviving beyond

a time point, to talk about the cumulative

proportion who have died.

But in either case, the curves are very similar.

If we actually did the log rank test, we could actually test

the Knoll hypothesis, that the survival,

the underlying population level survival curves.

Are the same between the placebo group, the beta-carotene

group, and the vitamin-A group.

And the corresponding alternative

is at least two of these populations.

Have different survival,

curves.

So when this was all said and done, the p value for this was 0.80.

So, consistent with what we saw in terms of our confidence

intervals for the incidence rate ratios for each two group comparison.

And what we took in visually from

those Kaplan–Meier, the result was not statistically significant.

This was a large study with over 3,000 infants

in each of the three groups that we were comparing.

And so again, we'll get to power in its

formal definition in the next set of lectures but.

The point is in this type of study there was enough information to find

a statistically significant difference if it really existed at the population level.

So the researchers concluded that the reason they didn't get the statistically

significant result was because there was no Benefit, or harm in,

introduced to infants by having the mother take vitamins supplementation during

pre-natal care. And they concluded that Vitamin A or

Beta-carotene supplementation, was not associated.

So prenatal vitamin A

or beta carotene

[UNKNOWN]

associations, not associated

with infant mortality.

We've already done some of the leg work for this where we got

the incidence rate ratios comparing each

combination groups in confidence intervals on that.

But we could certainly do that to complete the comparisons.

But this overall comparison tells us there's nothing to really look for.

There's no associations.

Another example this is one, Return to Work Following

Injury, the Role of Economic, Social, and Job-Related Factors.

I just want to show you

how the authors describe their methods.

And partly this is taken directly from the text.

The main dependent variable in the analysis is the time in days

from injury to the first time the study patient returned to work.

Paplan, Kaplan-Meier estimates of the cumulative proportion

of patients returning to work were computed.

These estimates take into account how long patients were followed as well

as when they returned to work so. Describes what we laid out in electrified.

And they say a log ranked test was used to a test the association between the acute

of probability of return to work in each

of the risk factors considered one at a time.

Here's a graphic they showed in this paper, where they actually chart.

The Kaplan Meier estimate the cumulative proportion

retrained at work, not the proportion who

have not yet had the event which

would be the traditional Kaplan Meier, presentation.

So this charts the proportion of people who've returned to work by given

time and these curves they're hard to

distinguish because they're all the same color.

But the

top curve, these are grouped by impairment scores

based on the intrigues they suffered in the workplace.

Group A had an average impairment below a threshold of 0.35, group B

was between 0.35 and 0.45, group C was greater than or equal to 0.45.

Now, I, we don't know exactly what these scale measures mean but.

Essentially, they're breaking these into discreet

groups of low impairment, medium and high.

And not surprisingly you can see that the, the

proportion returning to work by, over time, is higher

for those with the less, least impairment, compared to

those with medium impairment, which is in turn higher.

Compares those with the lowest impairment.

And the authors show, they didn't show this in this picture,

but they showed that this association

was statistically significant via the Log-range test.

And theere were other factors they considered

as well, so ultimately they did an

analysis that allowed for the inclusion of

multiple factors in predicting return to work.

But that's what we'll get, that type of analysis we'll get into in term two.

But this is what we call an

unadjusted comparison, looking only at average impairment as

a predictor of return to work.

Let's look at another study here.

Post partum antiretroviral therapy for children born to HIV positive women.

This is a 2012 study from the New England Journal of Medicine.

And so, I'll just give you some from the abstract here.

Taken from the abstract.

The background for this is the safety and efficacy of adding antiretroviral drugs to

[NOISE]

to standardize AZT

prophylaxis.

In infants of mothers with HIV infection, who did not

receive, antenatal, anti retroviral therapy

because of late identification are unclear.

So, they, researchers evaluated three antiretroviral

therapy regimens in such inst, Imprints.

So what they do is, after a 48 hour spurt, they randomly assign

formula fed infants born to mothers with HIV-1 infection to one of three regiments.

Either to get AZT or

[UNKNOWN]

for six weeks. AZT for six weeks plus two doses of

[UNKNOWN]

for the first eight days of life, or AZT

for six weeks plus Nelfinavir and lamivudine for two weeks.

So there are three different treatment groups here, and the primary outcome

of interest was HIV-1 infection at three months in infants uninfected at birth.

So what they've shown here is again this Kaplan-Meier presentation that's

one minus the survival curve the estimated proportion if you still didn't.

Contract HIV over the 40 week period

that student, that the infants were under study.

And,

just going to show you, this, this, what I'm focusing on

here is a, a blow up of the graph here.

Because the relative proportions, numerically of the

children who contract HIV were small, so,

if the axis goes from 0 to 100%, it's hard to see what's going on.

But what you can see here is there's sort of a breakout here.

This is tracking the cumulative proportion of

children who contract HIV over the follow

up period. You can see the highest curve here.

And because we're tracking the proportion with

the outcome and the outcome is unfavourable, the

highest curve did the worst, is the

treatment for the children that got AZT alone.

And then these two, curves here, are closer to each other and shifted downward.

And these are the tw0 AZT plus supplementation groups.

And what it says, and I've just blown this up on the side here.

This is taken verbatim, from the text under the table.

Is the Kaplan-Meier curves, this is labeling

the graph, for interpartum transmission differed significantly.

And then they report a P value of 0.03 for the overall comparison.

That P value here is from the log rank. The log rank test.

Where the null, is being tested is that the

survival curves, describing,

time to contracting HIV, in this population

of infants on the three drug groups.

Our equivalent at the population level versus the

alternative that at least two of them are different.

They go on to say transmission rates

that were highest in the zldorudine-alone group.

3.4 at 4-6

weeks versus 1.6% in the two drug group. And

1.4% in three drug group and then 4.8% at three months versus

2.2% in the two group, drug group and 2.4% in the three drug group.

So they give us some numerical summaries in

certain places long the follow up period as well.

So they'd go on in the paper to

actually quantify by incidence rate comparisons the relative instance

rates of contracting HIV for these three groups

of infants and they'd put confidence limits on these.

But this demonstrates overall that there was a difference between the

three groups and what it looks like but what they concluded.

And what they concluded was that the, zito phine group.

The AZT

group alone, is statistically significantly worse than the

other two groups when they looked at the

two group to two group comparisons.

And so finally they state in their conclusions, in Neoh

and mothers did not receive antiretro thial therapy during pregnancy.

Profilaxis with a two or three drug, A or T regime.

Is appeared AZT alone for the prevention of inter-partum HIV transmission.

And then they say the two drug regimen has less toxicity

than the three drug regimens, so they also looked at that.

So there ultimate conclusions were here,

that these two groups did significantly better than AZT alone, but among the two

groups, the two drug regimen was less harmful to the infants.

So, this looks like more of the same conceptually, right?

Well the way

[UNKNOWN]

test works, is it starts with the null

that the underlying population level survival curves we're comparing

are the same.

Versus the alternative, that

at least two populations, have

different, have different curves.

And the way it works is just like any other hypothesis test.

We start assuming the null.

Then we create a measure of discrepancy between

what we observed or what the researched observed

and would would be expected under the null.

And it goes through and at any point an event

on any of the curves being compared it sets off.

A contingency table that looks at the number of events

that occurred in each of the groups at that time.

And then it creates another contingency table, which is the expectant

number to have occurred in each of the groups under the null.

It does that in each time point across

all the groups being compared, where there's an event.

Then it aggregates the discrepancies across

the entire curve, follow up period, to

create one over, measure, measure of

discrepancy between the multiple groups being compared.

And then it compares that value, based on, the study being analyzed,

to the distribution of such values under the null hypothesis, and

figures out whether.

The result is likely or unlikely with regards to having occurred under the

null, which is where we get the p-value and we can make our decision.

So the mechanics are a little more detailed

and certainly not easy to do by hand.

But the approach is exactly the same as

all hypothesis tests we've done in the course