A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

238 个评分

Johns Hopkins University

238 个评分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

从本节课中

Module 2C: Summarization and Measurement

This module consists of a single lecture set on time-to-event outcomes. Time-to-event data comes primarily from prospective cohort studies with subjects who haven to had the outcome of interest at their time of enrollment. These subjects are followed for a pre-established period of time until they either have there outcome, dropout during the active study period, or make it to the end of the study without having the outcome. The challenge with these data is that the time to the outcome is fully observed on some subjects, but not on those who do not have the outcome during their tenure in the study. Please see the posted learning objectives for each lecture set in this module for more details.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this lecture set, we're going to see how

to quantify time of an outcome difference between two samples.

And the quantification will involve creating

what's called an Incidence Rate Ratio,

a ratio of the incidence rates of the events between the two groups.

And, the interpretation is very similar to the

relative risk we had for binary outcomes before.

Unlike binary outcomes however, we won't go forth and create a difference in the

incidence rates between two groups.

We'll see in the next couple sections after this, that

there's a richer way to actually get a baseline in

understanding of what's going on at any given time across

the entire follow up period using what are called Kaplan-Meier curves.

Okay now lets get to comparing time to

event data between two or more samples numerically.

It won't probably come as any surprise

to you but what we're going to do is

compare the incidence rates computed for some time event

outcome on different samples and compare those incidence rates

But what will do it in a ratio format.

So upon completion of this section, you will be able to estimate a

numerical comparison of timed event outcomes

between two populations using sample rate estimates.

Interpret the resulting estimate, this incidence rate ratio

in words and a public health scientific context.

And remind yourselves something will recur throughout

the course, that sometimes ratios, and we talked

about this in the last set of lectures,

sometimes ratios are presented on the log scale.

So let's go back to our Mayo clinic data and actually get

into the real data as posed to using the sort of hypothetical.

Patient profiles that I used before just to illustrate the time to event context.

As I noted before this was a randomized

trial amongst patients with primary biliary cirrhosis, PD, PBC.

And they were randomized to receive either D-Penicillamine, or placebo, and a

research question of interest was How

does mortality, and therefore survival, those are,

mortality is the opposite or compliment of survival, right?

Death or survival for PD, PBC patients randomized to receive the drug DPCA.

How does that compare to the patients who got the placebo?

Now I'm going to give you some data that I computed.

I had the dataset available to me.

And I'll break it down into the subgroups or DPCA and placebo group.

Amongst those who were randomized

over the ten year study period.

To the DPCA group they contributed collectively 872.5 years of follow

up, and this includes people who died during the study follow-up.

And persons who either dropped out or made it to

the end of the study without dying and hence were censored.

So that's a total cumulative amount of time

contributed to the study by everyone randomized to

the drug group. And amongst those, there were 65 deaths.

So,

let's take a look at the incidence rate. The incidence rate in the of death in the

DPCA group is 65 deaths per 872.5 years of follow-up time.

If we compute this, mathematically this can be expressed

as 0.075 deaths per year of follow up time.

Let's compare and contrast this with the placebo group.

The persons randomized with

the placebo group, contributed collectively 842.5 years of

follow up, or 60 deaths. So the incidence rate from this group.

With 60 deaths per 842.5 years of follow up

time, which can be expressed as 0.071 deaths per year.

So immediately, you see that these

estimated incidence rates are similar, but the

estimate is larger slightly for the DPCA group.

Just to give you a little more perspective on this and I should have

noted this before, there were a total of 312 persons enrolled in this trial.

312 people total of which, 158 are

randomized to get DPCA. And 154 were randomized

to get placebo.

So you can see that amongst the people who

originally enrolled there was, a high proportion that ultimately died.

So, one way to actually take these two One

very common way to take these two incidence rates between

the two groups, and mash them into a single

number summary is to take the ratio of these two.

Unlike proportions, we don't take a difference as often when

it comes to incidents rates, we jump straight to the ratio.

And this is the commonly used summary method.

It's still important to get information on the underlying rates

that go in to this ratio, so that you

can get some context for how risky the outcome is.

So, we have an Incidence Rate Ratio of 0.075 deaths per year in the

DPCA group divided by the .071 deaths per year In the placebo group.

This ratio is 1.06.

So this highlights.

This estimate highlights the fact that the rate was slightly higher in

the drug group, interestingly enough. So how could we interpret this in words?

We could say the risk of death in the DPCA group within

the study follow-up period is 1.06 times the risk in the placebo group.

Another way to express this is to say

subjects in the drug group, the DPCA group, had

6% higher risk of death in the follow-up

period, when compared to subjects in the placebo group.

So this quantifies the degree of increase we saw

amongst those who were randomized to the treatment group.

Let's look at another example.

This is the antiretroviral therapy and partner to partner HIV

transmission among discordant sexual couples that we talked about before.

Let's just go on and read some of this, and then discuss it.

They say there were 28 partner to partner linked transmissions.

Amongst the couples,

[UNKNOWN]

couples in the study.

And only one occurred in the early therapy group.

Which means that the other 27 occurred in the

standard or delayed therapy group, the, the comparison group.

And they report something that will become synan,

synonymous, and we'll flesh this out in more

detail in the second term Something relatively synonymous

to a incidence rate, ratio, a hazard ratio.

And so we'll just say this is a synonym for incidence

rate ratio and we'll call it incidence rate ratio.

This ratio is 0.04.

How did they get that?

We, I don't have access to the data But

I can go through and talk about what they did.

So, again hazard ratio and incidence rate ratio for our purposes now are synonymous.

Okay.

So what they found, wha, how they compute this.

There were 28 linked transmissions.

And only one occurred in the early therapy group.

That's what they said.

So, essentially, I don't have access to the entire data set but they

took the incidence rate of linked

transmissions in the early, early therapy group.

Whereas there was one linked transmission divided

by the total follow-up time amongst the couples

in the early therapy group and divided by

the incidence rate estimate in the standard group.

Push there were 27 link transmissions, divided by the

total follow up time in the standard therapy group.

I'm calling it standard therapy, it was called delayed in that abstract,

but essentially, it was synonymous with what the standard of care was.

And this ratio turned out to be 0.04, as they reported.

So how do we interpret this?

Well, we could say that HIV discordant at baseline couples,

in which the HIV positive partner was given early anti-retroviral therapy

had point O four times the risk of in couple transmission.

When compared to couples in which the

HIV positive partner was given standard therapy.

The risk for the treated group was 0.04 times the risk for the other group.

HIV discordant at baseline couples in which the HIV positive

partner was given early antiretroviral therapy had 96% lower risk.

This is another way to say that within couple transmission as compared

to couples in which the HIV positive part was given standard therapy.

So how do we get that?

Again, well our incidence rate ratio is 0.04.

You can think of that implicitly as being 0.04

to 1, for every 1 part risk they had in the standard therapy group.

The early treatment group had 0.04 of that.

So if we wanted to compute what kind of change or decrease this was.

We'd, we could say, well, we'll take 0.04 minus

1, and divide it by the comparison of 1.

So this is negative 0.96 or a 96% reduction.

We could certainly report this either way, but

the, expressing it is the percent reduction drives

home the point that there was substantially lower

risk of link transmission in the early treatment group.

Okay, let's go back to our maternal vitamin supplementation

infant mortality, and here's the abstract we looked at before.

But again, what we were interested in looking at, and we were given

a 2 3rds random sample of these data on the live births group.

so 10,295 live births with six month follow up.

Here are the incidence rates of infant mortality In the six month followups.

So I'll just give you summaries

based on the data that I was given.

Amongst the infants who were born to mothers treated with vitamin A during

pregnancy there were 578,590 days of follow-up, and a total of 236 deaths.

So this turns out to be an incidence rate of 0.00041 deaths per day.

We could play around with that to make it more user friendly, but let's

keep it as it is for now, because

we're ultimately going to compare it to the other groups.

If we did the same thing for beta carotine.

Here are the summary statistics in term of follow-up time and number of deaths.

And the incidence rate in this group was 0.00039 deaths per day.

And interestingly enough, if you do the math for the placebo group.

The estimated incidence

rate is essentially identical to that in the beta carotene

group of 0.00039 deaths per day.

So we wanted to compute these incidence rate ratios.

Well, we have three groups.

We have three groups.

And what we can do, what we've talked about

in other situations with comparing means or comparing proportions,

is we make one of these the reference or

comparison group which we compare the other two groups to.

So I suggest, although you won't have to,

I suggest just making the placebo the reference group.

Since we're really interested in the potential efficacy

of vitamin supplementation on infant mortality.

So if we actually compare the incidence ratio

for the vitamin A group to the placebo

group, if you actually take those two numbers

before and take the ratio of them, it's 1.05.

In other words, we see 5% higher risk

of death among those whose mothers got Vitamin A.

You remember this is just a sample based estimate but

it's interesting.

If we compare the beta carotene group to the

placebo group well, as we said before, the incidence

rates were numerically identical so this incidence rate ratio

is one, indicating equality in the estimated risk of mortality.

Among infants born to mothers in these 2 groups.

So, we could say about this 1.05, the estimated child mortality rate in the

Vitamin A group is 5% greater than

the estimated child mortality in the placebo group.

And for the comparison of beta carotene to

placebo we could say the estimated child mortality

in the beta carotene is the same as

the estimated child mortality rate in the placebo group.

Let's look at one more

example to sort of drive home this idea of incidence rate ratios as a measure of,

summary measure for the association of grouping with a timed event outcome.

This is an article published in

the Journal of Medical, American Medical Association

called association of race and age

with survival among patients undergoing diag dialysis.

And so the context

for the study, I'll just read it in case it's hard to read

here is, many studies have reported

that black individuals undergoing dialysis survive longer.

Then those are white.

This observation is paradoxical, given racial disparities

in access to, in quality of care and

was, is inconsistent with the observed lower

survival among black patients with chronic kidney disease.

So they go on to say, we hypothesize that age in the competing

risk of translate

[INAUDIBLE]

modifies survival differences by race.

And their main outcome measure is death

in black versus white patients who receive dialysis.

And their comparison of interest is mortality in the follow-up periods

that they have in these data for black patients to white.

And then they go on to tell where they got this data here.

This was an observational cohorts study.

The implied word there that they didn't

put is perspective.

It's a lot of data here, but this was captured in the United States renal data

system between January 1st 1995 and September 28th, 2009.

And, the median potential follow-up time was 6.7 years.

That gives us a sense.

And that can range anywhere from one day to almost 15 years.

So there's variation how much follow up time each of the subjects

in this data base, in this cohort they

created from this database, contributed to the study.

So what they've done here in this graphic is they've shown into this rate ratio.

They were actually interested, not only in comparing the mortality amongst black and

white patients, but they didn't necessarily want to do that on the whole.

They wanted to stratify that by age and see if the associate differed.

If the association between mortality and race was different by age.

So they were taking on a phenomena which we'll get

on to in start reasoning two called interaction, or affect modification.

That is, instead of taking the entire sample and comparing all black patients to

all white patients and potentially adjusting for

sweeping differences between those two racial groups.

They first broke them into different age categories and

then compared Black to white patients within the narrow age categories.

And they did adjust these, and again we'll get into adjustment in the second term.

But, ultimately what they have here in this graphic are the estimates.

And, then these bands around them, on the

graphics are called confidence intervals which we'll define shortly.

But, the dots in the middle Are the actual

estimates we're looking at, so amongst 18 to 30 year

olds for example, amongst 18 to 30 year olds, the estimated relative incidence

rate ratio for black patients on dialysis to white patients is nearly two.

So higher risk for black patients in this group.

When we get in the 31 to 40 year old group, the

[UNKNOWN]

rate ratio, which compares the observed rate of death in the

follow up period for black patients 31 to 40 years old.

To the incidence rate in the follow up period for

white patients, 31 to 40 years old as close to 1.5.

So it's still showing the black patients have higher risk.

But you see what happens with age is that the

older the group we're looking at, the lower the risk

of mortality for black patients to white patients until changes direction.

Amongst older persons, black patients have lower risks.

So the incidence rate ratio is down dipped below one once

we get to the 51 to 60 year old age category.

So it's sort of an interesting, a very

interesting find that the association between mortality and dialysis,

the association between mortality on dalysis patients and race.

Depends on how old the patients are.

So one size doesn't fit all for the race comparison.

You actually have to look and ask, what age group are we looking at?

But they use these incident rate ratios to quantify this.

They call them

[INAUDIBLE]

ratios.

For our purposes now, those are synonymous.

And these have been adjusted for other characteristics that may be

different among the black and white patients that contribute to mortality.

And again, we'll get more into adjustment detail in the second

term, but the interpretation of these things is as incident rate ratios.

Notice they present it on a long scale, and what this does

is that it's hard to see here where the numbers are relatively

comparable and to, and not too variable in the positive effects.

Positive associations, if you will, in ones above 1.

But if you look carefully, here, you can see the

scaling of the Y axis is not the traditional arithmetic scale.

If you look at the distance between 0.75 and

1, for example, that's very close to Well that's actually

equivalent, if you were to look at the distance between 1 and 1.33.

It's hard to see here because the log of

1.33 is just the opposite of the log of 0.75.

If we had even larger positive effects, this would be easier to see.

But what this is doing on the, they've relabelled this with the

actual ratio values but the scaling here is not arithmetic or traditional.

You can see that the distance

between certain things is not as it would be on a traditional linear scaling.

So this is again just to put this seed in

your head about the issue with log scaling an ratios.

So just so you think about, an I'll come back to this in the end

of, unit review questions, but what could

potentially happen here if follow-up time was ignored?

We, we talked about this for any single

summary measure, but what, how about with group comparisons?

So instead of comparing the follow-up time

via the incident rate ratios between the groups,

what if they actually just took the proportions

and compared them, the proportions having the outcome?

I want you to think about the implications of that.

So in summary, the incidence rate ratio.

The IRR which we can estimate for any sample data, called IRR with a hat on

it, can be used to quantify the differences

in the timed event information from two samples.

And you can really think of this as a relative

risk measure like you saw before but, but that incorporates.

It recognizes the two dimensions of our data.

Both the binary

outcome of interest and the differences in

subject follow-up time in to the comparison.

But the ratios have very similar interpretation as

the relative risk comparisons we were looking at before.