A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

来自 约翰霍普金斯大学 的课程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

209 评分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

从本节课中

Module 4B: Making Group Comparisons: The Hypothesis Testing Approach

Module 4B extends the hypothesis tests for two populations comparisons to "omnibus" tests for comparing means, proportions or incidence rates between more than two populations with one test

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So, in this section I'm going to try

and give you a little insight if you're interested

as to how a power calculation is actually

done by the computer, and what goes into it.

And this section is algebraically messy, but if you're

comfortable with algebra, it shouldn't be a problem to follow.

If you don't like algebra and the messiness

that comes along with the accounting that needs to

be done, then you don't need to worry

about this section, you can stop the recording now.

So consider the results we've looked at from a study

done on 29 women all 35 to 39 years old.

So this is what we kicked off the example of low powered

studies with, where we had a random sample and when we classified

them as to whether they were using oral contraceptives, at the time

of the study we found eight of them were and 21 were not.

And their blood pressures were measured and there

was, in least in the samples, there was

elevated blood pressure by a little over five millimeters of mercury for those

women who were currently on oral contraceptives at the time of the study.

And so we had a mean estimate for each of the two groups, and then we

had standard deviations of the eight and 21

blood pressure measurements in the two groups respectively.

So suppose we want to design a study with 80% power to detect a mean difference.

Let's suppose we decide that the minimal detectable difference of

interest that would raise clinical flags is five millimeters of mercury

between the two groups, and the [UNKNOWN] we're concerned with

the five millimeter increase associated on average with oral contraceptive use.

So, let's think about, let's first envision the end

of our study, after all has been said and done.

We will reject the null and pick up a difference in general, if

the difference in our study results, the sample means between the two study groups

divided by their estimated standard error, if that ratio, if our result is

more than two standard errors away from the expected value under the null of zero.

So if the absolute value of the standard error

distance is greater than or equal to two. So, what

we want is a guarantee, so to speak, that we will reject that the

probability of rejecting or getting a distance measure

of greater than or equal to two in absolute value is at least 80% if

in truth the true mean difference in population mean

blood pressures between oral contraceptive users and those who don't use oral

contraceptives in this population of women is at least 5 millimeters of mercury.

I say greater than or equal to because if we are covered for

five at 80%, then we're covered for a larger distance with even higher power.

So let's just think about this for a minute.

This black curve, describes the re,

sampling behavior of our estimated mean difference from

random study to random study of the same

size under the null hypothesis when the true

population's being prepared by the samples have equivalent means.

We're going to reject this if our result is in these tails

here, is beyond two standard errors away, either above or below zero.

So, I'm just general-,

generically, I'll say, let's assume it's above zero but we

could draw the same picture for a difference below zero.

This blue curve here

represents, this is for a specific non zero difference.

It could be five, for example, if we

were interested in a difference of at least five.

This is the sampling distribution of our estimate around the

truth when it's equal to some specified difference of d.

And so, we're going to make a decision based

on this curve, but if in reality our populations

are such that the true difference is non zero is equal to d,

then the distribution of our estimates will

actually follow this dotted blue line curve.

So we want to make sure that our chances of having

a result that we would reject under this No hypothesis distribution is

large, if in fact our samples come from populations who's means differ by a

prespecified amount.

So we want to sure that this area here, this blue area

here, is at least equal to 80% for the study we're designing.

How can we do this?

Well let's first consider our standard error.

And we're going to treat it like the true

standard error here since we're hypothesizing about the truth.

But in general the standard error of our estimates is

going to be a function of the individual variability of

the measurements, the standard deviation in each of the two

populations we're comparing in the size of the samples we have.

So just to make the algebra a little more tenable, we could,

and we could, this is generalizable, two situations where the sample sizes

aren't equal, but just to make

this presentation easier to follow, let's assume

from the start that we want the same number of women in each group.

So that gives us a little bit of collapse-ability in terms

of our estimated standard error, in terms of expressing it algebraically because

we can pull out that common denominator of that same n

in the formula, it just helps us a little bit with accounting.

So this shows what the expected

standard error of a mean difference in sample means

is, based on samples of the same size n,

when the underlying variability in the two populations we're

comparing are sigma oral contraceptives and sigma, no oral contraceptives.

So, we want to design a study with 80% power to detect a

mean difference of at least five

millimeters of mercury between the two groups.

So we want the probability that, when we take our

observed sample mean difference and divide it by its standard

error, the result is greater than two in absolute value,

because that will yield a p value of less than 0.05.

And we want that probability of doing so, making that decision, to be 80%

if in fact the true difference in means is at least five millimeters of mercury.

Again, we're going to compute the p value under the assumption

of no difference in means, but we want our chances of

rejecting the null to be relatively high if instead the truth

is the difference in means is at least five millimeters of mercury.

So we want this ratio, the probability that this ratio, the absolute value

of this ratio is greater than or equal to two, to be 0.8.

So doing a little algebra we can re-express this

as the probability of the numerator, the absolute value of

the mean difference being greater than or equal to two times

the standard error, and we don't have to put absolute

value around standard error because that will always be positive, okay?

So let's think about how we're going to compute this.

Well, what we really want to compute is the probability that

z is greater than some number a

equals 80%, where z represents a standard normal variable.

So, what we have to do is convert our measurement to

a standard normal variable by subtracting off its mean and standard error.

So, under this alternative hypothesis,

the mean of the distribution we're computing this probability under,

is five millimeters of mercury and the standard error is

given by this same formula that we used for the

standard error of the mean difference estimate under the null hypothesis.

So, to put all this together to convert this statement

to a standard normal curve, the probability of the observed

sample mean difference being greater than or equal to two standard errors

under the null, under the alternative hypothesis where we assume the sample

distribution is center at five, and has variability equal to the standard error.

We have to standardize this by the mean and standard error of this alternative

distribution to get something that it follows the standard normal variables.

So we'll take this

value, and we want to see where this value falls relative to other

values under this distribution, but we're going to first have to subtract off the

mean of this alternative distribution, then divide

by the variability in the estimates in

this distribution, which happens to, again, be

the standard error of this mean difference.

So, this looks

messy algebraically but what we're doing here is just computing the probability of

getting a standard normal variable greater than

or equal to whatever this standardized value,

standardized to the alternative hypothesis of five millimeters

of mercury in the mean difference distribution is.

Okay, well it turns out based on a standard normal curve, the

value that cuts off 80% of the area to its right is 0.84.

So we need to solve, if you will, the value such

that our resulting, we want our power to be 80%, so we want to have this area

under the alternative distribution, where the true mean

difference is five, the probability of being, of being

beyond this point, which is the rejection cutoff under the null, to be 80%.

Now, it turns out, on the standard normal curve, this point is negative 0.84.

So we need to solve the equation for the values of n, given an estimate of the

standard deviation values in both groups. The value of n makes

this equation equal to negative 0.84.

So with some more beautiful algebra, we can cross multiply to get

this expression here, do a little more manipulation and when all the dust

settles, if you go through this algebra carefully, we get and expression for n

that looks like this; The necessary n to reject the null with

a probability of at least 80% if in fact the mean

difference between the two populations is at least five millimeters of mercury.

And if you actually solve this based on the estimated

standard deviation inputs we have from our pilot study data, so

this is our estimate of sigma for the oral contraceptive

users or estimate of sigma for the non-oral contraceptive users, and

you solve this, you get an n equal to 183. And remember we assumed the sample

sizes were equal in both the groups. So we'd need 183 women in both groups.

If you wanted to solve for different sample sizes you would

have to represent one of them as a mult of the other.

So, for example, if we expected four

times as many women in the non-oral contraceptive

group as to compared to the oral contraceptive group, if we were taking an

overall random sample then classifying women as

to their oral contraceptive usage status as opposed

to sampling separately from those two populations to get an equivalent number,

you could do something like represent n1 is

equal, or n2 is equal to four times n1.

And then put this expression in wherever you see n2 and you'll have the whole thing

written out in terms of n1, you can solve for n1 and then use that to express n2.

So, so, if you're not into algebra, this

is a particularly unpleasant experience to go through

this, but if you like algebra and the

accounting involved and can follow the logic of

what we are computing under power, this is

kind of fun informative way to understand what

the computer is doing when it gives the

necessary sample size to achieve the certain power.