A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

255 个评分

Johns Hopkins University

255 个评分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

从本节课中

Module 4B: Making Group Comparisons: The Hypothesis Testing Approach

Module 4B extends the hypothesis tests for two populations comparisons to "omnibus" tests for comparing means, proportions or incidence rates between more than two populations with one test

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Okay I, since we've been very idea heavy in

the past couple lectures in lecture set 13 I just

wanted to give a very brief wrap up of what

we've been talking about and some of the main ideas.

When designing a study there's a tradeoff between the desired power,

the alpha level.

The sample size, and the minimal detectable

difference for specific alternative hypothesis of interest.

Some of these things are pretty much fixed.

The industry standard for power is 80 or sometimes 90%.

And the rejection level is generally .05. So we can't mess with these inputs.

However, we can play around with the

minimum detectable difference of other paramaters like

standard deviation when comparing means of continuous

data to change our sample size computation outputs.

So what happens?

This is a frequently asked question if the sample

same size calculation yields group sizes that are too big.

In other words you would not be able to afford to

do this out of your but not ask for that much money.

Or sample sizes that make it very difficult

for recruit the number of subjects uh,uh needed what

can you play around with?

Well, the really the only option you have is to increase

the minimal detectable difference of interest

to reduce your necessary sample size.

And that, and if you're comparing means, you can also reduce the estimated standard

deviations in the populations, and that will have some impact on your sample size.

Theoretically you could increase your alpha level, but nobody's going to

be on board with that, so that's really not an option.

Or you could decrease your desired power.

If you're starting with 90%, you may be able

to take it down to 80% for most funders.

But certainly can't go below that threshold,

so this isn't a very viable option either.

So it pretty much comes down to

increasing the minimal to technical difference of interest.

Sample size calculations are a very important part of a

study proposal, because what study funders want to know is

that the researcher can detect a relationship of interest with

a high degree of certainty, should it really exist as specified.

So, they don't want to fund a small study with a lot of money

because the ability to find difference of

interest is compromised giving a low power.

When would you calculate the power of a study after it's been done?

Well maybe if you're using data collected.

From someone else and you know the sample sizes

involved and you want to do a secondary data analysis.

You could look at you're power and detective difference

for the data or got funding to buy the data.

Also, if you're looking at a study that you've done were the data has already

been collected and the sample size if fixed, you can use the software we alluded

to in the previous lecture sections to determine the study's power

to detect a specified minimal detectable difference of interest as well.

You might also do a pilot study to illustrate that the low power maybe

a contributing factor to non-significant results and

that a larger study may be appropriate.

So what is this specific alternative hypothesis?

We've said multiple times now that power or sample size,

and we've really focused on computing sample size for desired power.

Can only be calculated for a specific alternative hypothesis.

So when comparing two groups this means or

entails estimating the true population means of the two

groups being prepared or the proportions if it's a

binary outcome or incidence rates if it's timed event

for each of the groups.

This difference is frequently called the minimum detectable differences,

and often you'll also see it referred in the

literature and in some software packages, some of the

free online sample size calculators, as the effect size.

And it refers to the minimum detectable difference of scientific interest.

And again we've discussed how when doing research and

looking for differences, there are differences that could exist

at the population level, but are too small for

them to be clinically relevant, and so we don't

want to put our resources into studies that are powered

to find small but non- substantive differences of interest.

So the minimal detectable interest really gets us started with how far the

groups would have to differ for the

results to be scientifically or clinically useful.

So where does this specific alternative hypothesis come from?

Well hopefully not the statistician and I say

that becuase I've been asked to help with sample

size computations before and the expectation is that

I would come up with the minimal detectable difference.

And I can't do that not being an expert in the particular science, scientific area.

I can help talk somebody through some options for getting a

minimal detectable difference, but I can't just come up with one on

my own as a statistician.

So as this is generally a quantity of scientific

interest, it is best estimated by knowla, knowledgeable researcher.

Or even better if there's some pilot study data.

And this is perhaps the difficult

component of sample sized calculations is that

there's no magic rule or industry standard and involves a lot of educated guesswork.

One last thing I want to note is that all the computations

we did, I did not address the issue of potential follow up.

So the estimates I showed you, given by

the software do not account for potential dropout,

or loss to follow up, or non participation by those elected to be part of the study.

So, frequently, what researchers will do is add a buffer of 5 to

10% the necessary sample sizes to achieve a desired level of power to allow

for some dropout or non participation.

So, we can go back and make over the resulsts I gave for each of

the scenarios and a final request for sample

size would be those numbers with 10% added.

To actually account for potential follow up.