A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

238 个评分

Johns Hopkins University

238 个评分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

从本节课中

Module 4B: Making Group Comparisons: The Hypothesis Testing Approach

Module 4B extends the hypothesis tests for two populations comparisons to "omnibus" tests for comparing means, proportions or incidence rates between more than two populations with one test

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this section we'll look at sample size computations, the inputs, and

the results for studies comparing two or more proportions or incidence rates.

We'll actually look at examples for comparing two groups for both measures but

you could extend the idea that we've set up for

means for more than two groups to the same sort of thinking for proportions or

incidence rates where you look at the necessary sample size for

each unique two-group comparison and take the maximum across all of those.

So upon completion of this lecture section, you will be able to describe

the relationship between power and sample size with regard to the size of minimum

detectable difference in proportions or incidence rates between two groups.

And understand the impact of designing studies to have equal versus unequal sizes

on the total sample size necessary to have a certain power.

And it will be the same situation as we saw with means

in terms of the impact that has.

So the idea of comparing two proportions, the inputs we need for the software.

To do this are actually simpler for proportions than with means, it's the same

idea as if with comparing means, except that we don't need any standard deviation

estimate for the values in each of our groups because you may recall the standard

deviation of a proportion is a function of the proportion itself.

So once we specify the expected proportions in the groups we're comparing,

that's taken care of.

So we can find the necessary sample sizes of a study if we specify the alpha level

of the test which will again almost uniformly be 0.05.

Specific values for the two proportions, And

hence the difference in proportions between the two groups we're comparing,

and that usually represents the minimum scientific difference of interest,

and then the desired power which is generally 80% or sometimes 90%.

So let's go back to our peptic ulcer example where we have the two drugs for

the treatment of peptic ulcer.

This was the situation where we saw a small study done that showed very

large difference in the percentage of people healed from peptic ulcer and

the two drug groups being compared, 77% in the first group compared

to 58% in the second group, for a difference of 19%.

But, we saw that this study had low power, and the resulting p value was 0.17,

it was not statistically significant and the margin of error was large, and

the resulting confidence interval for the difference in proportions was very wide.

So the power to detect a difference as large as the sample results of 19% risk

difference versus the samples of size 30 and 31, respectively, is only 25%.

So this study had a large margin of error and low power.

So perhaps as a clinician, you may find the sample results intriguing.

You might want to do a larger study to better quantify the difference in

proportions healed.

We already showed how to look at that based on the margin of error.

Let's look at the root through power.

You redesign a new trial using the aforementioned study results to estimate

the population characteristics.

You might start off with the observed sample results and say, if this were

really the truth that was estimated by the study, this risk difference of 19%.

Which corresponds to a relative risk of 1.33.

How many people would I need to have in each of the groups to have 80% power,

with a rejection level of 0.05?

So as this is a randomized trial,

to start let's assume equal sample sizes in the two groups.

So based on using statistical software, in order to detect a difference of 19%, which

is rather large, we'd need 105 people in each sample, for a total of 200 persons.

And this actually corresponds to a margin of error on estimate that's still

relatively large of plus or minus 12% but because our

detectable difference is so large, that margin of error is sufficient for

finding a significant difference if it really exists.

Suppose our funding vantage says, look, this is pretty optimistic and

it would be very helpful even if this drug were effective at a lower level for

example, if drug A was effective such that

it improved healing by an absolute difference of 10% or

alternatively presented as a relative risk of 1.15.

Increase the individual's chances of being healed by 15%.

That would be very notable, so could you rerun the numbers for that?

Well, of course, now we're making the minimum detectable difference smaller.

It's going to be harder to see, and so

the necessary sample size to have the same power of 80% will be larger.

If we run the numbers we need 335 people in each group.

As opposed to the 135 we needed for the very large difference of 19%.

Suppose we thought well even a 5% risk difference would be notable clinically and

for treatment purposes.

And that would correspond to a relative risk of 1.07 or

7% increase at the individual level but since peptic ulcers are so pervasive.

Having this kind of impact would be very helpful from a public health and

personal health perspective.

Well, if we make our difference even smaller, down to 5%,

we're really going to ramp up the number of people we need in this study.

We need 1,232 persons in each of the two groups, much larger than what we initially

saw when we had such a large minimal detectable difference of 19%.

So we could make a table in a grant proposal, we might not make

our scenarios this varied from a difference of 19% down to 5% but

maybe we do something like 10%, 7%, 5% and show the necessary sample size for

each, and may ask for the biggest possible one if we thought that

this detectable difference of 5% were still clinically relevant.

Suppose you wanted to design a randomized clinical trial where you had two times as

many people on Drug B as compared to Drug A,

since Drug A is ostensibly perhaps the new drug being tested.

Maybe it's more expensive at this stage in the game, and so

you want it to have smaller number of participants in the Drug A group.

But you still want it to power to detect a difference of 19%.

Well how would this affect our overall sample size

compared to when we add equal sample sizes?

Well if we did this, we'd need 80 people in the Drug A group and

160 people in the Drug B group for a total of 240 subjects,

which is greater than the 210 we need for equal sample sizes.

And if we ran this scenario

where we expected to try it on as many people in Drug B compared to Drug A for

the other minimal detectable differences would look at 10% and 5%.

The total number we need in this scenario with unequal sample sizes

would surpass that for the numbers we got with the equal sample sizes assumption.

Now let's just look at an example

comparing two incidence rates between two populations.

It is going to be a much larger study than we had before.

Suppose a randomized trial is being designed to determine if

vitamin A supplementation can reduce the risk of breast cancer.

And the study will follow women between the ages of 45 and 65 for one year.

And women will be randomized between the vitamin A and placebo group.

So what sample sizes are recommended?

Well, to get started, we have to get some estimate of the incidence rate

in the year, a followup of breast cancer.

In the two groups of interest.

So, suppose we want to design this study to have 80% power to detect

a 50% relative reduction in the risk or

incidence rate of breast cancer with vitamin A compared to placebo.

In other words, the study is designed to find an incidence rate ratio of 0.5.

And we want to do this with a significance level of 0.05.

So how are we going to get estimates of the incidence rates of interest in the two

groups being compared?

Well, perhaps using other studies, on breast cancer,

maybe the breast cancer rate in controls can be assumed to

be 150 cases per 100,000 women per year.

So if that's the case, if that's our starting point and

we want to design a study to have find an incidence rate ratio of at least 0.5,

then our incidence rate, expect in the Vitamin A group,

would be half of that of the incidence rate expected in the placebo group.

So we could do this by taking the expected rate in the placebo group of 150 cases per

100,000 women per year, and multiply it by 0.5 to get, under this scenario [SOUND],

the expected rate in the Vitamin A group, which is 75 cases per 100,000 women.

Per year.

So as this is a randomized trial,

to start let's assume equal sample sizes in two groups.

If we actually ran the numbers on this using statistical software,

we would need 33,974 people in each sample.

The vitamin A sample and the placebo sample for

a total of nearly 68,000 persons to have 80% power to detect

an incidence rate ratio of 0.5 or smaller.

So we would need about 34,000 individuals per group.

Well, why so many?

Well, the difference between the two hypothesized incidence

rates is very small.

That 150 per 100,000 women, minus the 75 per

100,000 women anticipated in the vitamin A group is

a difference of 75 cases per 100,000 women.

Which as a number is 0.00075.

So the difference we're looking for is very small numerically.

And given these anticipated incidence rates in the two groups,

if we did sample 34,000 women for each of the two groups.

So sample 68,000 women and randomized them to one of the two groups.

We'd only expect in the year of followup to see about 50 cancer

cases among the controls and 25 cancer cases among the vitamin A group.

So we have relatively small proportions of the outcome in both groups.

Suppose the Cancer Association came back and said this is a great idea.

Vitamin A is easily given out.

It's inexpensive.

It's not harmful, so we'd actually be very interested in

Vitamin A as a propolactive for breast cancer if it had a less of an impact,

maybe they only a 20% reduction in relative risk because that would have

a huge impact at the population level given so many women in this age group.

So a 20% relative reduction would imply that the incidence rate ratio we're

trying to detect is on the order of 0.8.

Remember that corresponds to a 20% reduction in the vitamin A group

compared to the placebo.

So again if we start with this starting estimate.

Of, in the placebo group of incidence rate of 150 cases per 100,000 women per year,

and we multiply it by 0.8, our desired incidence rate ratio.

We'd expect to see an incidence rate of 120 cases per 100,000 women

per year in the Vitamin A group.

And again, as this is a randomized trial,

let's assume equal sample sizes in the two groups.

And based on using statistical software, we would need 241,889 women in each

sample, for a total of over 480,000 persons total.

So we really need a lot more because the detectable difference here is a lot

smaller.

And given the large number of women necessary under the scenario of equal

sample sizes and given that fact that the treatment of interest can be randomized,

there's really no reason to consider other sample size computation scenarios where

we'd have unequal sample sizes because that would even make

the number of persons even larger than this very large amount we have here.

And so, under this scenario to detect an incidence rate ratio of 0.8,

we need about 242,000 women per group.

And that's because the underlying incidence rates are very small

numerically, and hence the difference is very small numerically and

very hard to see.

So we need a lot of magnification or power to detect that difference.

And even so, with 242,000 women per group, we'd only expect to see about

360 cases in the placebo group, and 290 in the vitamin A group.

Under our assumptions about the incidence rates in each group, so we have to look at

a large number of people just to see enough cases to detect a difference.

Sometimes what's done as an alternative approach to studies with

short follow up periods and this comes with some

difficulties in terms of increasing the likelihood of drop out etcetera.

Design a longer study so instead of doing one year followup,

maybe we could propose to do five year followup on the women we sample and

randomize, in which case our expected incidence rates

over the five year periods would be five times what they are per year.

So in this scenario.

Where we want to detect an incidence rate ratio of 0.8.

If we extended our study period to five years,

and we prorated the incidence rates for one year,

So 120 per 100,000 women per year.

If we did that over five years, we'd expect to see more cases,

In both groups.

And that would make our underlying incidence rates over the five-year

period larger in both cases than the detectable difference larger

if we make the study longer.

And if we did this we'd need about 48,000 women per group but

we need to follow them not for one year but for five years.

And if we had the anticipated incidence rates in both groups we'd

expect to see 290 cases develop among the vitamin a group over the five

year period as compared to 360 cases among the placebo in the five year follow up.

So in summary, when designing a study to compare proportions or

incidence rates from two or more populations, a researcher must have some

estimate of the expected proportion with the outcome or

the incidence rate of the outcome in each population being compared.

And the sample size necessary to achieve a desired power.

To detect a minimal detectable difference in proportions or

incidence rates is a function of the difference and the desired power.

And as I said at the beginning of this video, we didn't look at any examples

of comparing proportions for incidence rates between three or more populations,

but you could extend that example we gave with means where we could

look at all possible two population comparisons for desired power and

then take the maximum sample size necessary across the comparisons.