A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

238 个评分

Johns Hopkins University

238 个评分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

从本节课中

Module 2B: Summarization and Measurement

Module 2B includes a single lecture set on summarizing binary outcomes. While at first, summarization of binary outcome may seem simpler than that of continuous outcomes, things get more complicated with group comparisons. Included in the module are examples of and comparisons between risk differences, relative risk and odds ratios. Please see the posted learning objectives for these this module for more details.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this section we're going to talk about one more way to quantify

the association between a binary outcome between

two or more populations using sample results.

And we'll, we'll talking about a number that oft reported in journal

articles and used in epidemiology and other sciences called the odds ratio.

So, in this lecture section what we're going to be able

to do is quantify the association between a binary variable

outcome between two or more groups as this thing called an odds ratio.

And we're going to compare it to the relative risk.

Both are ratios of some function of the sample proportions, and

we'll compare and contrast them both

conceptually and numerically in this section.

So again, we will start this section by looking at our data

set on the thousand HIV positive patients from a city wide clinical population.

And recall we has broken these into subgroups: those, the sample

portion whose CD4 counts at the start of therapy were less

than 250, as compared to the group whose CD4 counts are

greater than or equal to 250 at the start of therapy.

And what we were doing is summarizing the portion

who responded in each of these CD4 count groups.

And you'll probably recall from the previous section, that 25%

of those whose starting CD4 counts were less than

250 responded to the therapy, as compared to 16%

in the group the CD4 counts were greater than

or equal to 250 at the start of therapy.

So we have already shown how to summarize this in two ways.

We're going to go to a third in this section, but just to

refresh your memory, if we took the difference in these proportions, the risk

difference or attributable risk.

The 25% responding in the first group, the group with less

than 250 CD4 counts, minus the 16% who responded in the

group with CD4 counts of greater than or equal to 250,

was a positive .09, or 9%, 9% greater on the absolute scale.

9% greater response for the lower CD4 count group.

We also talked about another measure

using the exact same two numbers, which would be the

ratio of proportions, or the relative risk, or the risk ratio.

Those are synonyms for each other.

And where we take this 25%, instead

of subtracting the 16%, the response proportion in

the sample was greater than or equal to 250 CD four counts, we divide by that.

So the ratio is the 25% responding in the lower CD4 count,

divided by the 16% responding in the larger CD4 count group.

That gives us the relative risk of 1.56, 56% greater chance of

responding for the group with the lower CD4 count at the start of therapy.

That's a relative comparison.

The third measure we're going to look at, is

something called the odds ratio or the relative odds.

So what is odds?

What does it mean?

We've all heard it used colloquially,

probably as a synonym for proportionate probability,

but it actually has an explicit definition that we're going to look at here.

The odds of an event, and we're going to talk about the

estimated odds because that's all we can get at from sample data,

the estimated odds of an event is a function of the risk

or probability of the event occuring, but it's not exactly the risk.

It's the

risk of the event occuring, divided by the risk or probability of it not occuring.

So, it's our sample proportion, the percentage with the outcome, divided by

one minus that sample proportion with the percentage without the outcome.

And, there is a relationship between risk

and odds, they track in the same direction.

So as the risk sample proportion increases,

so does the odds.

So, for example, let's like make a little

chart here of risk or proportion versus odds.

So let's make a little hand-drawn table, here, of risk versus odds.

So suppose your risk is zero, in the sample,

nobody has the outcome we're looking at, for example.

What would the odds be?

Well, the proportion, we have the outcome as zero.

So p hat is 0. 1 minus p hat is 1.

The odds is 0 over 1, or 0. So they're equal in that case.

And how about, suppose the outcome, the percentage having the outcome is 25%?

So, p hat is 0.25.

Well, in that case, what is the odds? It's 0.25, the

percentage with the outcome divided by 1 - 0.25 or 0.75,

the percentage without the outcome, 75% do not have the outcome.

And so the odds is 0.33, or one in three.

So it's not, it's not exactly equal to the risk, or probability or

proportion of 25%. If the risk is .5, or 50%, then the odds

is .5 divided by one times .5, which is also .5, or one.

And you've heard the expression 50:50 odds, well that just means that the risk

of having the event, or outcome, is the same as the risk of not, 50% for both.

If we get into a risk or proportion of over 0.5, 0.75 for example, then the

odds that corresponds to this is 0.75 over 0.25, or 3:1 odds, it's 3.

And we could keep going with this, if we up the probability, or risk of an

outcome to 0.95, the corresponding odds is 0.95 over 0.05 which

is equal to 19, odds of 19:1, of having the outcome.

As the risk gets closer to one, the odds gets larger,

numerically, and as our risk approaches one, the odds approaches infinity.

So now, let's look at the odds of responding

to therapy for each of our two CD4 count groups.

So, for the first group,

the group who's CD4 count at the start of their therapy was

less than 250, the proportion responding in our sample of 503, was 25%.

So, the odds, is the 25%, the chance or probability or proportion of

responding, divided by the chance, or proportion not responding, 75%.

And this is equal to 0.33, or that 1 in 3 we saw on the previous slide.

For the other group, the group with the CD4 count greater than or equal to 250 at

the start of therapy, the proportion responding in our study was 16%.

So the odds response is 16%,

the proportion or probability responding, divided by

the proportion who did not respond, or the probability of not responding, 84%.

So now, now that we have the odds

for the two groups, the odds ratio takes these odds for

the two respective groups and compares them in a ratio format.

So our numerator, if you want to do the odds ratio and the comparison

in the same direction we've been doing of the group with the lesser CD4 count

at the start of therapy compared to the group with the greater CD4 count,

we'd take the odds of response for the group with CD4 count less than 250,

which is that, one in three we talked about.

That would be our numerator for this comparison, and the odds for

the group who had the greated CD4 count, that 0.16 dived by

0.84, would be the numerator for this comparison, this ratio, not of

the proportion or probability responding directly, but of this function, the odds.

And what we get here,

is a number, 1.75.

So, this is a ratio that compares the relative odds

or response for the first group compared to the second.

So, the group with the lower starting CD4 counts had 75% greater odds of responding

as compared to the group with the greater than or equal to 250 CD4 counts.

So how can we interpret this odds ratio? Well as I said before, one way to say this

would be, to say the group with CD4 counts of less

than 250 at the start of therapy has 1.75 times the

odds of responding to therapy, as compared to the group whose

CD4 counts are greater than 250 at the start of therapy.

Or rephrased, the group with the lower CD4 count has 75% greater odds of responding

to therapy than the group with the higher CD4 counts at the start of therapy.

Now this odds ratio sounds suspiciously like a relative risk.

But it's actually not.

It's a little more obfuscating than a relative risk.

It's not a direct comparison of risks but a comparison of

this function of risks, the odds for each of the two groups.

So let's just talk about the difference between these two

and we'll reinforce this throughout the rest of the course.

But here's some commonalities.

Relative risk and odds

ratio will always agree in the terms of the direction of comparison.

So, if one group has a higher risk than the

other, it will subsequently have a higher odds, and vice-versa.

But the odds ratio and relative risk will not always be the same value, and we'll

look at some more examples and talk more

about under what conditions these are similar versus not.

But just the start, in this example, we can see that both the

relative risk of 1.56 and the odds ratio

of 1.75 were greater than one, indicating a

greater response to therapy for the group with

the lesser CD4 counts at the start of therapy.

But, by one metric the risk is 56% higher, and by the other metric the odds,

not the risk, but the odds which is a function of risk, is 70%, 75%

higher. And these do not fully agree numerically.

So now let's look at our HIV in infant, maternal infant

transmission example, and add the odds ratio to our measures of association.

So, again, this is the two by two

table characterizing the outcome of HIV maternal infant

transmission amongst the mothers, pregnant mothers with HIV,

who were given AZT during pregnancy, compared to

the mothers who were treated with a placebo, were not getting treatment.

We've seen this before,

and we already talked about the risk difference, or difference in proportions.

Again, the underlying proportion of children who contracted HIV

within 18 months, passed on from the mother was

7%, for the mothers who were given AZT, compared

to 22% to children born to mothers given the placebo.

We saw that that was risk difference, or attributable risk of -15%.

And then,

we saw that the relative risk, that 7% divided by the 22%, was the risk,

relative risk of 0.32, we define that. So let's compute the odds ratio.

So what we need to compute this is that we need to compute the odds of a maternal

HIV infant transmission for children born to mothers who were given AZT.

So that would be the

7%, the proportion who actually contracted HIV within

18 months, divided by the 93% that didn't.

This is the risk of contracting HIV divided by one

minus that risk, or the risk of not contracting HIV.

That's our odds.

And then we need to do the same thing, compute the odds for the placebo group.

Children born to mothers in the placebo group,

and that was that 22% who contracted HIV,

divided by the 78% who didn't. And we take this ratio of the two

odds, it turns out to be 0.27. So

it is also less than one but its not exactly the same value of

this relative risk here. So how could we interpret this?

We could say that the AZT group has 0.27 times the odds of HIV

to child transmission of the placebo group.

The odds for the mothers given AZT, of transmitting to

their child, is .27 times the odds for the other group.

Another way to say this is that the AZT group has 73%

lower odds of HIV to child transmission relative to the placebo group.

I'll let u confirm that with the raw odds, taking that percent difference or we could

say that 0.27 is really the ratio of 0.27 to 1.

And, 0.27 is 73% lower than 1, the starting point.

So let's compare and contrast the relative risk and odds ratio.

Again, in this example the relative risk and odds ratios

are .32, for the estimated relative risk, and .27, for the

estimated odds ratio, with respect to they are numerically slightly

different but they both indicate a lower numerator than denominator.

Okay.

So, how do we interpret this odds ratio substantively?

Well, as with the relative risk, the odds ratio can be interpreted as the impact,

assuming causation of the exposure, or the

treatment in this case, at the individual level.

A HIV positive pregnant woman, can reduce

her individual odds of passing or transmitting

HIV to her child by 73% if she

takes AZT during pregnancy compared to if she didn't.

Again, this odds ratio though, does not directly compare the probability or risks

or proportions of an outcome, but instead compares this function of risk, the odds.

Both measures use the exact same information again.

So we're taking these two proportions and we've

seen now we can compare them in three different

ways the risk difference and then in terms

of ratios, the relative risk or the relative odds.

So if the relative risk estimate, P1 over P2 generically, P1 hat

over P2 hat, is greater than 1, then the relative odds for the

two groups will be greater than 1.

In other words, if the relative risk estimate is greater than

1, so will the relative odds estimate be greater than 1.

Similarly, if the relative risk estimate

is less than 1, the resulting odds ratio will be less than 1.

And if the relative risk estimate is equal to 1, then the odds ratio be equal to one.

So they will concur in terms of the direction and whether

or not these equality but they won't necessarily be the same numerically.

The smaller that the estimated proportions are in the two samples we're

comparing the closer in numerical value the relative risk in odds ratio will be.

So the rarer the outcome in the two groups we're

comparing, the closer in value these two things will be.

Why is that?

Why do you think that it is?

Well, recall the odds ratio is equal to the risk in the first

group divided by one minus that risk, that's the odds for the first

group divided by the same comparison but for the second group.

So think about it, if P1,

and P2 are very, both small,

like very close to 0, the closer they get to 0 or

close to 0. Close, I'm going to put in quotes here.

The closer they are to zero the closer 1 - p hat 1 and

1 - p hat 2 are to one. So, when P1 hat and P2 hat

are close to zero the odds for both groups are close to

the risks in those two groups and the odds ratio is close to the relative risk.

So we can have equivalence or near equivalence

with smaller underlying proportions in the groups we're comparing.

So let me give you an example of that.

Let's go back to our example with

Aspirin and cardiovascular disease development in women.

This is that randomized trial reported on women with 45 years of age or older

who received 100 milligrams of Aspirin on

alternative days or alternate days or a placebo.

And then they were followed

for ten years.

So again, if we look at the proportion of persons

who, women who developed cardiovascular disease, in the two groups,

over the ten-year follow-up period, we saw that 2.4% in

the Aspirin group as compared to 2.6% in the placebo group.

And so when we computed the risk difference, it

was low numerically, was a negative 0.002 or negative 0.2%.

And the relative risk of that 2.4% in the Aspirin

group compared to the 2.6% in the placebo group was 0.92.

Well, in this situation if we compute the odds ratio for CVD,

cardiovascular disease, within ten years, the odds for the Aspirin

group relative to the placebo group, well, these proportions are quote unquote

small, 2.4% and 2.6%. And if we look at the odds ratio, and I'll

let you verify this, it's very close to the relative risk of 0.92 in this example.

And this is compared to the previous two examples.

The underlying proportions the two groups we're comparing were smaller and so

the odds ratio relative risk were close certain numerical value that what

we saw on the previous two examples. So how would we interpret this?

Well, we could say the Aspirin group has 0.92

times the odds of developing cardiovascular disease as compared

to placebo group, or the Aspirin group has

8% lower odds of developing cardiovascular disease than the placebo group.

So, the relative risk versus the odds ratio in this

example, they're identical in value, unlike the previous two examples.

And that has to do with the fact that the

proportions who have the outcome where smaller in both groups.

So, a question you've probably been thinking about since the beginning of

this lecture, is why do we even bother with the odds ratio?

You know, it seems sort of out of left field.

It's not a direct comparison of the probabilities or proportions.

It seems

less intuitive than the other two.

In many ways, especially on the ratio scale, the odds ratio is less intuitive.

And less and a less direct measure of association than the relative risk.

So, why do we even deal with this odds ratio?

Why am I bringing it up here? Why do we talk about it in public health?

Well, there's two reasons.

In some types of studies, something called a case control

study, which I mentioned in the beginning of this course,

but we'll spend a little more time on in term two, the

odds ratio is the only measure of association that can be estimated,

and we'll talk about why.

That's a little street marketing for, like section two of

the course, but we'll talk about why when we get there.

In logistic regression, which is a method to extend what

we're doing in these sections here, which is also coming

in term two, the results we get are initially presented

as odd ratios and hence frequently presented as such in publications.

So, we want to be familiar with what this is, but also how

differs from the other measures of association.

With more than two categories, how could we compare the odds?

This is extension of what we did with the risks, as well.

A common practice is to designate one of the categories as the

reference group and present comparisons of all other categories to this reference.

And while the chosen reference group or

the choice of reference group is arbitrary, in

many cases it, again, is purposely chosen

to highlight the substantive emphasis of the, manuscript

or the presentation.

So, let's look at an example of trying

to measure the association between obesity and depression.

And so from the abstract of this paper that was published

in the American Journal of Epidemiology, say data from the third

National Health and Nutrition Examination Survey, TNHNE Survey, conducted between

1988 and 1994, were used to examine the relation between obesity and depression.

And, they used

past month depression, was defined using criteria from

the DSM of mental disorders, the third edition.

And was measured with a diagnostic interview schedule.

And then they used body mass index to

define obesity using the cutoff of 30 or higher.

And they compared the risks of depression in obese

and normal weight persons as characterized by the BMI index.

So here's a table that they use to compare depression-related outcomes measured at

different times the past-month, in terms the

survey, the past year, lifetime, and recurring.

And they did this for all respondants and they compared obesity categories and

then they did it separately for females and males in this large table.

I'm going to zoom one section of this table

and relate it to what we're doing here.

And what they did, I'm going to focus on

the outcome, the binary outcome of past-month major depression.

They asked each of the responders did they have depression in the previous month.

And the answer was yes or no.

It was the binary outcome.

And they actually did, it's a little hard to see from this table, but I'm

going to cordon off the area where they

classified people in terms of different obesity categories.

And what they did here

is they actually classified people as being normal

weight, underweight, overweight or obese, defined by their BMI.

So they had four overall categories of weight.

And what they do here, you see this column

here, back here, where they do an odds ratio,

and then they do a confidence interval with that,

and we'll get to confidence intervals shortly in the course.

This odds ratio column, we have to pay attention to what is going on here.

And, a lot of times, there is only so much information in

the table, and a lot of it is relegated to the footnote,

but if you see where they have the, so this area here,

is separate from the previous part, so this looks at the obesity categories.

And you notice that the odds ratio they give for the normal

weight category is a one, and they put this symbol next to it.

And the symbol is actually a footnote, and where they I'm showing

you the piece of the footnotes here where we see the symbol.

They say this designates the reference category.

So, what they're getting at is that odds ratio of 1.0 is

the normal weight category compared to the reference category, which is it itself.

And of course the odds are equal for normal weight, and normal weight.

And that is sometimes how they designate that as 1.0.

This is the category whose odd of past-month major depression

we're going to compare to for the other categories of weight.

So now we've got the underweight category, the BMI

less than 18.5, and the reported odds ratio here 1.17.

Well, this 1.17 is the odds of depression

in the previous month for the underweight category

divided by the odds of depression the previous

month for the reference group, the normal weight.

So they suggest that those who were underweight had 1.17 times the odds

of being depressed in the previous month compared to those of normal weight.

They had 17% higher odds. The next ratio here compares

the relative odds of depression in the previous month for the

overweight group compared to the same reference group of normal weight.

So this .86 here is the relative odds for the overweight group.

Odds of previous month depression divided by the odds for the normal weight group.

The same reference group as the previous comparison and it's 0.86.

So that suggests that the relative odds. This is on the odds ratio scale.

This is not

a risk ratio, but it's the relative odds of being depressed in the previous

month, were 14% lower for the overweight group compared to the normal weight group.

And then when they get to the obese group, and present this odds ratio, this 1.88.

This is comparing the odds of depression in the previous month for the

obese group to the same reference that's been used in the other comparisons,

the normal weight group.

So the relative odds of past-month depression for the obese group

is 80% higher or 1.80 times that from the normal weight group.

So, now I could ask you and I

will probably do so in the extra exercises section.

But, given that all these odds ratios are comparing the respective

groups to the same reference you could back into, for example,

the relative odds for depression, in the previous month, for

those who were obese compared to those who were overweight.

Just think about that for now.

So, in summary, the odds ratio, the estimated

odds ratio, or OR hat provides an alternative

to the relative risk estimate, RR hat, for

quantifying the association between a binary outcome between groups.

The odds ratio is a ratio of odds between two groups.

Odds is related to risk or the probability of proportion

of an outcome, but it's not exactly the same thing.

It's a function.

The odds ratio and relative risk both estimate the association

between a binary outcome between groups at the individual level.

And these will agree in terms of direction, but not always magnitude.

The smaller the risk or proportion of the outcome in the

groups being compared, the more similar these two quantities will be.

So something to think about, what is the odds ratio?

How does it differ from the relative risk?

And like I said, we're introducing it here because

it is a legitimate measure of binary outcomes across groups.

And it will have more relevance when we get further in the course and we'll see

that there are certain types of situations where

it becomes the only thing we can estimate correctly.

So the next section we'll actually talk about an interesting property

of ratios that we'll want to give pause to think about.

And one of the quirky things about ratios is the

range of possibilities for associations that are negative, meaning that

the group and the top of the ratio, the numerator,

has lower risk or value or odds, than the denominator.

The range of possibilities for that type of association is very

different than the range of possibilities than where

the group on top is larger than the bottom.

And that can make for difficulty interpreting things

depending on the direction that we've purported in.

So in the next section we'll talk about this

property and one of the ways to deal with it.