A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

238 个评分

Johns Hopkins University

238 个评分

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

从本节课中

Module 2C: Summarization and Measurement

This module consists of a single lecture set on time-to-event outcomes. Time-to-event data comes primarily from prospective cohort studies with subjects who haven to had the outcome of interest at their time of enrollment. These subjects are followed for a pre-established period of time until they either have there outcome, dropout during the active study period, or make it to the end of the study without having the outcome. The challenge with these data is that the time to the outcome is fully observed on some subjects, but not on those who do not have the outcome during their tenure in the study. Please see the posted learning objectives for each lecture set in this module for more details.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So, how are we going to graph this?

Well, you've sort of seen from the examples we gave

before that the graphic is somewhat of a step function.

Although we had large samples, so it was a little bit harder to see that.

But with a small sample, you'll see it pretty exclusively.

This curve is only going to change at each observed event time.

There's going to be a jump there.

And nothing is assumed about the curve's shape between each observed event time.

So we will remain

with the most current estimate of the proportion surviving beyond a given time

up until the point where we see another event.

So here is the Kaplan-Meier estimate graphically presented for these data.

So here on the horizontal axis is the follow up time since the event.

And here is the curve tracking the

proportion of people who remained event free.

So we can see at time zero, 100% of the sample is still around and stays around.

So our estimated proportion, meaning event free beyond

time one for example, would still be 100%.

We don't actually see anything happen until this

first jump here, where it drops down to.

[BLANK_AUDIO]

Until this first jump at two, two months where it drops down to

92% and it stays at 92% through three months, four months, et

cetera until we see the next jump at six months and so on and so forth.

So this is the nice graphical presentation of

the data we just showed in tabular format.

We can use the Kaplan-Meier curve to

estimate percentiles of time to event data.

For example, if we wanted to estimate the median timed event in these data, what we

could do is find the place on the curve that corresponds to 50%.

Now, you can see and I've done this kind of

sloppily because my pen is not very fine tuned here,

that there's no point and you can confirm this with the tabular presentation.

There's no point on this curve that it hits exactly 50% on the vertical axis.

So the convention is to then drop down to the next event time,

where the curve first drops below 50%. So in this case,

in this case, it would be at time 16, and you

can verify this with the table, since the curve drops to 39%.

39% of the original sample survived beyond 16 weeks.

That's the first time the curve dropped below 50%, and so the

convention is to report the value at which the curve hits 50%.

But if it does not hit it exactly, then we would take

the first time point were it drops below 50%, so our estimated median

for these data, the median event time is 16 months by that standard.

So here's another example of percentiles.

While getting percentiles from the data, this is

the Mayo Clinic, the Primary Biliary Cirrhosis curve.

And this is the results from the randomized clinical trial.

And this tracks the proportion of people who

were still alive over the follow up period.

So if we wanted to estimate, for example, we have a lot more data here so we can

actually do a pretty good job, at least off

the cuff graphically of estimating the median event time.

But if we want to estimate the median event time for these data where the

event is death, so the median death time where the curve goes to 50% we could

go over and drop it down. And it roughly is equal to nine years.

These data, the time was in years.

So the median's roughly equal to nine years here.

We could also estimate other percentiles.

For example, the 75th percentile.

[BLANK_AUDIO]

Actually this would be the 25th percentile, if you think about it.

Because 75%, we could also estimate other percentiles.

For example, the 75th percentile.

[BLANK_AUDIO]

We could also estimate other percentiles, for example, the

25th percentile, this sounds, seems counter-intuitive but if we go

back to the curve here, this 25th percentile for the

survival time, would actually be where the curve hits 75%.

Why is that?

Because this is the point at which 75% of the

sample, the original sample, is still alive beyond this time, and

hence 25% of the sample has had the event So our estimated 25th percentile based on

these data would be four months, four years after the start of the study.

And you could go on and estimate various other percentiles as well.

Notice that we can't estimate the corresponding 75th percentile

with this Mayo Clinic data because the curve stopped.

The largest observations were censored, and

the curve stopped at a value greater than the

point where 25% of the sample was still alive.

How about percentiles for the infant mortality rate in

the six months post-birth for our Nepali children data.

Well this curve actually doesn't drop very low at all.

In fact most thankfully, most of this observations were still alive at the

end of the study or were censored by the infant study for other reasons.

And the curve only drops to 94% after 180 days, so that means that 94% of

the original sample of infants was still alive beyond 180 days.

We estimate that 6% had died in that period

which is a very, very, very large percentage of death.

But still the curve only gets down to 94%.

So we are stuck when it comes

to estimating certain percentiles for good reason.

So we cannot estimate the median time to death

in the 180 days following birth, because we don't,

actually the curve does not drop to 50%. We don't lose 50% of

the children in this sample. And so we can't estimate the median, we

can't estimate the 75th percentile, because the curve never drops down to

25%, we can't estimate the 25th percentile for similar reasons.

We can estimate the 95th percentile, for example, by taking the curve where

it hits the time at where it hits 95%, which is

roughly, roughly about 40 days since birth.

[BLANK_AUDIO]

So there's another way of presenting the same results in

alternate presentation, alternative presentation, which

you'll see in papers frequently.

Instead of presenting the Kaplan-Meier Curve, as we've just

done, or more formally called the Kaplan-Meier Survival Curve.

Instead of reporting the S of t, the proportion of the original sample that

still has not had the event or in other words survived beyond the given time.

Researchers will report the compliment of that 1 minus

S hat of t which shows the cumulative proportion

of the original sample that has had the event

by a certain time in the follow up period.

So the proportion of people who have died by time six months or

the proportion of people who have had the event by 11 years, etc.

So let's go again back to our Mayo Clinic

data, this is the standard Kaplan-Meier curve presentation we've

been looking at, that tracks the proportion of persons

who are still event free beyond a certain time.

And we can see that easily because it starts at 100%.

So we know that we're dealing with that tracking, so

we know that that's what's being tracked by these curves.

But let's look

at another way to present the same data.

[BLANK_AUDIO]

Here's the presentation we just looked at and here's the complement to that 1 minus

S hat of t, you notice this's the, it's like a mirror image of the other curve.

Instead of starting at one, it starts at zero

because it's tracking the proportion of people who have died.

In this case, the event was death.

We've had the event by a given time. And everybody was alive at time zero.

So 0% of the

sample have died at time zero. And then this increases

over time, and the curve stops at about 65%.

So roughly 65% of the sample has died within the 12 year follow-up period.

And that corresponds to the marking on this curve, which shows that roughly

35% of the sample was still alive at the end of the study, at around 12 years.

So, there's two different ways of presenting the same information.

And papers, the, the ratio of which ones are, so two different

ways of presenting the same information and depending on the

whims of the editors of the paper, you'll see one or the other.

Similarly, if we were looking at this infant

mortality rate data, the infant mortality rate six

months post birth, this is the presentation we've

been looking at over the course of this lecture.

This is the traditional Kaplan-Meier survival estimate.

But we could present it in the opposite

complementary format which tracks the proportion of children who

have died up to a certain, by a certain point of time in the follow up period.

And so again, this starts

at zero, because at birth, we were looking at live

births in our sample and so everybody was alive at birth.

And then the percentage of people who died by a certain point, children

die by a certain point, goes up relatively quickly and then flattens out.

So, just like over here, we saw that at 180 days,

roughly 94% of the original sample of infants was still alive.

Beyond 180 days, that corresponds, it is exactly the same as

saying that 6% of the sample of infants had died by that time point.

[BLANK_AUDIO]

So let's just summarize what we've done in this lecture here.

Kaplan-Meier curves, the estimates, add richness and understanding

to time to event data from a sample.

By presenting in a graphical format that

two dimensions to the data separately, both

tracking the proportion of people who have had or haven't had the event, over time.

Kaplan-Meier curves use all the data in the sample, both the event and

the censoring time.

And the censored observations provide information about who are at risk of

having the event of interest at given times in the follow up period.

So they inform us about the denominated event for our time-specific

survival estimates. Kaplan-Meier curves are summary statistics

based on sample data, and they estimate the underlying, unknown, true population

survival curve.

So these graphics that we've been looking at are kind

of like visual versions of a sample mean or proportion.

It is an estimate that describes the entire

sample, and has some uncertainty associated with it.

But it's estimating some underlying curve that we can't observe directly.

We can estimate even time percentiles as we've shown using Kaplan-Meier curves.

And what we finished up on was, there are two

complementary ways to present the results of a Kaplan-Meier curve estimate.

The traditional way, where we the tracked the proportion

who's still not had the event by a certain time.

And the complementary way, where we talk about tracking the proportionate

of people who have had the event up to a certain time.

In the next section, we'll show how intuitive, and,

in the next session we'll show how to compare the

time to event experience between

multiple samples using separate Kaplan-Meier curves.

And it's a really nice visual tool for doing such comparison.

So again, something I want you to think about and

we'll discuss in the follow up exercises in this lecture set.

How do you think the survival

curve estimates would compare to the Kaplan-Meier

curve estimates if we actually ignored

the censored observations in the computation process.

>> [FOREIGN]

>> [LAUGH] There we go. Spanglish to the max.