A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 2: Regression Methods

81 个评分

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

从本节课中

Module 3B: More Multiple Regression Methods

This set of lectures extends the techniques debuted in lecture set 3 to allow for multiple predictors of a time-to-event outcome using a single, multivariable regression model.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Greetings, and welcome to Lecture Set 8, Section B.

In this section we'll give a brief treatise of the basics of model selection.

And show how the results from multiple Cox regression, can be presented in terms of

estimated survival curves or outcomes from the regression model.

So hopefully by the end of this lecture set,

you'll appreciate the linearity assumption as it applies to multiple Cox regression.

Explain different strategies for

picking the final multiple Cox regression model among candidate models.

How a researcher might do this, and this process is exactly the same as it was for

linear and logistic regression, but it's worth reiterating.

Use the results of multiple Cox regression models to

compare groups who differ by more than one predictor.

And appreciate that the results from multiple Cox regression can be

used to estimate group specific survival curves,

where the groups are defined by multiple predictor values.

So, let's briefly talk about the estimation process for

Cox regression, what the computer is doing.

The algorithm to estimate the equation of the multiple Cox

regression is called partial maximum likelihood estimation.

Same process used with simple Cox regression.

And what this does, is this uses, this estimates the slopes for our predictors.

These are the values that make the observed data

most likely among all choices for the slopes.

And this is a complex numerical algorithm that has to take some starting guesses for

the slopes.

And iterate until it finds the choice that

maximizes the likelihood of the observed data.

But before it does this, this was the, the, what was added by Cox and

what makes this method so unique.

Is there's a separate behind the scenes algorithm that

first estimates the shape or the function over time of the baseline hazard.

The one that's going to be used as a starting point at each point in time to

get to the hazard for our other groups defined by our X's.

And this is actually pretty neat because it doesn't make, force us to make any

assumptions about what the relationship between hazard and time looks like.

We don't get the opportunity to put in time as a continuous predictor,

time as a categorical, etc.

But this, the algorithm figures out the general shape, and

it isn't restricted to such constraints such that it has to be linear, or

it can only change at certain points in time.

It can estimate a very dynamic function.

But we have to leave that to the computer to do.

And then after it does that,

it does the maximum likelihood estimation process for the slopes.

But this, of course, all of this has to be done by a computer.

So what is the linearity assumption in multiple Cox regression?

Well, the linearity assumption is similar to what we saw in

the simple Cox with the additional part about adjustment.

It, it assumes that the adjusted relationship being estimated between

the log hazard of the binary event y and whether it's an event or sensory.

And each x is linear in nature and this is an issue for

continuance predictors, but not for binary or multi-categorical predictors.

And as with simple Cox regression, there is no graphical way to assess this.

But when fitting models,

researchers can compare the results of treating a predictor as continuous.

Versus putting it in as categorical to see if there's evidence in the categorical

formulation of a consistent same directional change in the log hazard for

increasing ordinal categories.

And if that's the case, they may opt to treat it as a continuous predictor and

estimate one overall association that exploits that relationship.

Otherwise, they may present it as categorical.

So how do researchers, when they're, when they've got data and

they have a bunch of potential predictors, how do they choose a final model?

Well, model building, as we've said before, and

selection is a combination of science, statistics, and the research goals.

So if the goal is to maximize the precision of the adjusted estimates.

The strategy, as it was with linear and logistic, would keep, keep only those

predictors that are statistically significant in the final model.

Do not contribute uncertainty to the model by estimating things that

don't need to be there, which then steal from the precision of

the things that do actually correlate with the hazard.

If the goal is to present results comparable to results of similar

analyses presented by other researchers, on similar or different populations.

Then in the process of writing this up, researchers would at

least want to present one model that includes the same predictor set.

As the other research does, even if some of

the findings are not statistically significant in this particular data set.

Then they can make comparable comparisons in terms of

the factors that were adjusted for, et cetera.

With the results from the other researchers' findings.

If the goal is to show what happens to the magnitude of associations with different

levels of adjustment, then a researcher could present the results from several

models that include different subsets or combinations of adjustment variables.

And if the goal is prediction, well again, this is slightly more complicated story,

and we will only be able to discuss it briefly, but

we'll touch on it towards the end of the course.

So let's look at the idea of prediction with regression results.

With Cox regression results.

And this is different than with linear and

logistic, because we can't actually do this by hand, because we are not

presented with any output with what the value of the intercept is.

The log hazard at base line, as a function of time.

It changes with time, so there's not one value,

an intercept, that describes the starting point for all comparisons.

So this has to be done by a computer, but

let's look at the results presenting some of the results from the predictors of

mortality in primary Bilirubin cirrhosis patients.

And I'm going to use the results from this model,

the one that includes predictors, includes treatment, age, Bilirubin, and sex.

To show, based, and I did this with the computer,

to show the estimated survival over the followup period for

different groups depending on certain characteristics.

So this can be done, but

it's computationally involved because what the computer has to do.

What has to be done is at each point in time, the log hazard for

a particular group has to be computed based on the intercept or

starting hazard at that point in time.

This has to done across the entire time period.

And then each of those time specific log hazards have to be converted into

cumulative survival estimates, but this can be done by a computer.

So for example, I'm not going to present all possible groupings here.

But this nice, nicely I think shows that I,

the estimated survival curves compliment the results from the Cox regression and

turned some of the hazard ratios, put them in the context of what that means in

terms of the cumulative probabilities in the follow up period.

So what I have here,

these two curves down here show the survival trajectories for males.

With Bilirubin equal to one milligram per deciliter, who are in the treatment group.

And males with Bilirubin of two milligrams per deciliter, so

we can see that there's a certainly higher Bilirubin was associated with.

Increased hazard, which results in reduced survival.

So, like this lower curve is a function of that difference in Bilirubin levels.

The two curves up here, are the same comparisons.

Subjects in the DPCA group who are female with Bilirubins of one and

two milligrams per deciliter, respectively.

And what you can see here, pretty clearly, is the same sort of

differences in the estimated survivals over time as a function of Bilirubin.

But what you also get from this picture is how dramatic the difference is

between males and females if you compare these two sets of curves.

And so I like this because it actually puts a face,

if you will, on what those hazard ratios mean in terms of the decrease in

cumulative survival over the follow up period.

And it also gives a sense of the magnitude of the difference in terms of predicted

survivals, between groups who differ by Bilirubin and groups who differ by sex.

It's certainly not an exhaustive presentation.

But, it helps to contextualize the results from that Cox regression.

We could write out the adjusted model on the regression scale, and

write it out in terms of the generic intercept as a function of time.

And then the slopes for each of our predictors, I got these from the computer,

but you could get these by taking the re, respective logs of the hazard ratios

presented in the previous table from the second multiple regression models.

So this an indicator here, so

one if they're in the drug group, a zero if they're in placebo.

Here are our indicators of the three age categories.

Remember the reference is the first quartile.

X2 is an indicator that the person's in the second quartile.

X3 is an indicator that the person's in the third quartile.

And X4 is an indicator that they're in the highest or fourth age quartile.

This is Bilirubin entered continuously in milligrams per deciliter.

And then here is that slope, that negative slope for

sex, where it's a 1 for females and a 0 for males.

So that's a difference in the log hazard scale which translates into

a hazard ratio.

On the order of 0.6, and we can see from the previous picture what that means in

terms of the difference in survival by otherwise comparable men and women.

So suppose that we wanted to use these results to estimate the hazard ratio of

mortality for 60 year-old males on

DPCA with Bilirubin levels at the start of one milligram per deciliter?

And compare them to 40 year-old females on the placebo arm,

with Bilirubin at the start of equal to 0.5 milligrams per deciliter.

Well if we wanted to do this on the regression scale, we could

simply write out the estimated log hazard of mortality at a given point in time,

for each of these groups by plugging in their X values.

So this was what it looks like.

The log hazard, at any point in time, is a function of the starting hazard

on the log scale, whatever it is at that point in time, plus the slope for DPCA.

Plus the slope for being in the fourth age quartile,

because these, the fourth quartile is greater than or equal to 57 years and

these males are 60 years-old, so they're in their 4th quartile.

Plus the slope for Bilirubin times their starting level,

which is one gram per deciliter and then because they're male.

Their x value for sex is 0, so they don't pick up anything for being male.

And when you write this out in terms of the baseline log

hazard at any given time and then the cumulative impact of these other things.

We get that the sum is equal to whatever the baseline hazard is on

the log scale plus 1.12 if you add up these three numbers.

If we do the same thing for females who are 40 years-old and

in the placebo group with Bilirubin levels of 0.5.

Then the log hazard is equal to the same starting log hazard at the comparable

time that we're making a comparison, which could be anytime in the followup period.

They're in the, de-

they're in the placebo group, so their value for pe-,

indicator of treatment group is zero, so they don't get anything for that.

They're in the lowest age quartile, so they don't pick up anything for

age because they're the reference there.

The Bilirubin level's 0.5, so we take the slope for

Bilirubin, 0.15 times 0.5, and because they're female.

Their x1 for sex is a 1 and the slope for that is -.51.

So when we combine the slope values into a sum,

we get that the estimate at any given time, is found by

taking the log of the baseline hazard at that time plus -.435.

This is the cumulative impact of having the Bilirubin level of 0.5 and

being female.

So if we actually took the differences in these estimates, for

males 60 years old on DPCA with a Bilirubin of 1.

That's this part here, and we subtracted what we get for females.

If you do this,

the difference in the estimated log hazards at any given time is 1.555.

And if we exponentiate that, we get a hazard ratio of 4.74.

So male, 60 years-old in the drug group,

with starting Bilirubins of 1 milligram per deciliter.

Have 4.74 times the risk of

mortality at any given time in the followup period when compared to females,

40 years-old in the placebo group with a Bilirubin level of 0.5.

And so this difference is the,

it's the culmination of the increased risk for being male.

The increased risk for being older.

The slightly increased risk for being a DCPA group, and being increased risk for

having higher Bilirubin, compounds into a hazard ratio 4.74.

Just want to show you something, certainly a lot of times in

papers they will not give you the results on the log scale.

And you could certainly take the logs with respect to hazard ratios and

write out the equation.

But I wanted to show you this.

If we, instead of actually mashing these together into one sum,

I keep the component separately in this comparison.

The difference between these two groups because of the difference in

treatment groups is 0.1.

That's because the males were in the treatment group and the females were not.

The difference between these two groups, because the age difference is 0.87.

The males can add that additional 0.87 to their hazard, because they were in

highest age quartile, compared to females who had no additional above and

beyond the baseline because they were in the reference category, lowest quartile.

The difference in Bilirubin levels is 1 for the first group, minus 0.5 for

the second group.

And so the Bilirubin contribution to this sum has to be the difference between

the two groups, is the slope of bilirubin times that difference.

And then the first group is male, so the get a value of zero for sex, but

the second group is female, so they get a value of negative 0.51.

Because their sex value is 1.

And so, these are the components that if we wrote this out,

equals that 1.555 we saw before.

But if you exponentiate this in its component parts, and

do a little mathematics you can see that it's the product of E to the first thing,

the slope for being DCPA.

Times E, the slope for

being in the older, oldest age quartile compared to the youngest.

Times the slope for being, for Bilirubin exponentiated raised to the 0.5 power,

because that's the difference in the Bilirubin levels for these groups.

Times E to the 0.51, which is the opposite of the slope for being female.

And what this turns out to be on the product scale is.

The adjusted hazard ratio for being in the DPCA group, compared to

the placebo group, times the adjusted hazard ratio from that table before.

Being in the oldest age quartile compared to the youngest.

Times the adjusted hazard ratio for a 0.5 difference.

And Bilirubin times 1 over the adjusted hazard ratio for being female,

because we're comparing in the opposite direction that that hazard ratio compares,

taken at face value.

In this example we're comparing males to females.

So you can actually get this kind of comparison based on

the adjusted hazard ratios just by multiplication.

And for things that are continuous, the hazard ratios that represent a per

unit change in the continuous variable like Bilirubin.

Taking that hazard ratio in the multiplication and raising it

to the difference, in that continuous value between the groups being compared.

So you don't necessarily have to, if you're reading a paper and

are interested in doing such a thing you can do it

directly from the results via this type of multiplication.

Its just an FYI.

Just shows how the math works.

So let's look at one more example of prediction with Cox regression.

We looked at several models, unadjusted associations, and

then two different adjusted models, looking at predictors of infant mortality.

And we found in the first lecture that pretty much the only two things among our

candidate predictors that were associated with infant mortality,

were gestational age and maternal parity.

And they each contribute independent information.

Neither of their adjusted associations was different than their unadjusted.

But let's just look at what the results would look like if presenting different

estimated survival curves for these inference based on gestational age and

two different parity categories, to look at the impact of parity above and

beyond gestational age.

So I put these curves next to each other,

and they just give some sense of what's going.

What I have here, in this presentation,

are the five estimated survival curves for the gestational age groups.

Amongst first born children,

amongst the group of children whose mothers had not had any previous children.

And we see clearly that big jump here's the pre-term group, less than 36 weeks.

And then here are the four other groups, and there's actually one is right on

top of each, so it looks like there's three curves, but

pretty much the story of gestational age shines through here.

That pre-term is really a risk factor for mortality.

And that reduction in hazard shown with full-term translates into

an estimated difference in probabilities of survival on the order of ten or

more percent in the follow up period.

So it's pretty dramatic.

If we look at the same presentation here, but

instead of looking at first born children we look at second born children.

You'll notice, if you compare that this is on the same scale, so

if you compare these curves to the curves over here, they all shift up a bit.

There's still a, quite a disadvantage in terms of increased risk and

decreased survival.

Of being pre-term in this curve again,

is very distant from the other gestational age categories.

But there is a benefit in terms of decreased risk and

increased survival of being the second born child compared to the first.

So you see these curves here on the same scale are shifted up,

at least relative to their counterparts among the firstborn children.

So, this kind of presentation, it's nice when authors do it.

It leaves for some of the groups just to sort of ground our understanding of what

the underlying cumulative survival looks like over the follow-up period.

And that helps us take those hazard ratios and

translate them into understanding about what it means in terms of

cumulative differences of survival between groups across the follow up period.

So in summary, multiple Cox regression results can used to

estimate cumulative survival curves of time to event outcomes.

For a given subset in a population, given the subset's predictor values.

Obviously we can't do this by hand, and you wouldn't be

expected to be able to recreate any of those given the equation for

Cox regression, but I just want you to be aware that this can be done.

So if you're working in a research group, and thinking how about,

how to present, present your results in a paper or a report.

You may advocate for publishing some estimated survival curves above and

beyond presenting the results from a multiple Cox regression to

contextualize what the impact of these predictors are,

in terms of the cumulative probabilities of survival over the follow up period.

And multiple Cox regression results can be used to estimate hazard ratios between

groups who differ by more than one characteristic.

And we looked at one example of that and

it was a very similar approach to what we did with linear regression except that

the results on the regression had to be exponentiated.

And it's analogous to what we did with logistic regression,

because we were dealing with log ratios on that scale as well in terms of the slopes.

Confidence intervals for these comparisons can be estimated.

But they need to be done by the computer because the standard error for a hazard

ratio and the law of hazard ratio that represents a comparison on multiple x's.

Has to be estimated by the computer, and it can't easily be done by hand, but

the idea is exactly the same.

If you get a confidence, if somebody gave you the standard error,

you could get a confidence interval estimate by taking the log as you add and

subtract in two estimated standard errors and exponentiating the result.

In the next section, we'll look at some examples of Cox regression used in

published articles from the public health and medical domains.