A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 2: Regression Methods

66 ratings

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

From the lesson

Module 3A: Multiple Regression Methods

This module extends linear and logistic methods to allow for the inclusion of multiple predictors in a single regression model.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Greetings and welcome to Lecture Set 6.

This has a rather long title but it's actually composed of two parts.

In Lecture 6A, we're going to give an overview of the idea of

multiple regression for estimation adjustment in basic prediction.

And then we're going to look in sections B through D at the specific case of

Multiple Linear Regression for these purposes.

In lecture set 7 and 8, we'll cover the same purposes of multiple regression for

a logistic and Cox proportional hazards.

So in this letters, set of lectures we will develop a framework for

multiple linear, logistic, and

Cox proportional hazards regression as I said before in the first section.

And then the remaining sections we'll focus on multiple linear regression which

is a extension of what we did with simple linear regression and

it provides a general framework.

For estimating the mean of a continuous outcome based on multiple predictors.

Each of which may be binary, categorical, or continuous.

So let's first give an overview of multiple regression.

So hopefully from this section, at the end of the section, you'll be able to

identify the group comparisons being made by a multiple regression slope

regardless the outcome variable type, whether we have a continuous, binary, or

time-to-event outcome being modeled by the regression.

And appreciate that multiple regression allows for both an outcome to

be predicted by taking into account multiple predictors with one method.

We don't have to look at associations one at a time anymore we can better

predict an outcome by taking multiple predictors into the story by one equation.

Multiple regression also allows for easy adjustment of a relationship or

relationships of interest for potential confounding variables.

And it's something we will cover after we lay down the basic framework of

multiple regression for these three types is this can be extended.

These methods can be extended to also look to

effect modification which we'll take on in lecture set 9.

And then I want you to realize, and hopefully this is.

You could almost do this in your sleep, now you're so used to doing it,

is to realize the approach to creating confidence intervals for

multiple regression intercepts and

slopes regardless of the type of regression, is more of the same.

So let's just go back to taking what we did in stat raising one and extending it.

So regression provides a general framework for

the estimation and testing procedures that we covered in the first term.

And as we discussed in lectures 1 through 3, many of the methods from

statistical reasoning 1 can be framed as simple, simple regression models.

But regression is nice because it allows for extensions, we

can add more predictors to our predictor set, and as such, multiple regression.

Allows for the extension of the methods from Statistical Reasoning 1 to allow for

multiple predictors of an outcome in a single method, and allow for

the estimation of adjusted associations relatively easily.

As we saw in simple regression already, regression allows for

the predictors to be binary, categorical, or continuous.

But the ability of a model to predict an outcome or function of an outcome

estimated by the model can be improved by using more than one predictor at a time.

So here's the basic structure.

It's basically going to look like our simple regression models with more Xs.

So, I actually queued you a little bit in the simple regression when we did.

The multicategorical predictors, we needed more than one X to model that.

So in the strict definition it was actually a form of

multiple regression because there was more than one X.

But there was only one predictor in that model set,

it just required more than one X to model it.

So I think of it contemp, conceptually as a simple regression.

But the basic structure of a multiple regression model will be

a linear equation with potentially multiple X's and slopes.

So I'll just say it's equal to some intercept beta not

hat plus beta 1 hat X1 beta hat 2X 2 onward and upward to.

Beta hat P times X P over P is just some number of Xs that we have.

And these Xs are the predictors of interest.

And the only difference for Cox regression.

It'll look pretty much the same but just replace that Beta naught hat

with the Lambda naught hat of T in the above equation because in Cox regression,

even in the multiple situation.

Will have the inner set that will vary as a function of time.

So, as with simple linear regression everything stays the same in terms of

the left hand side.

It's going to depend on what variable type the outcome of interest is.

For continuous outcomes, the left hand side, the thing that we'll be estimating

as a linear function of our outcome is the mean of this value variable.

And we can estimate the mean as a function of multiple predictors

by the equation we get.

For binary outcomes, the left hand side is the log-ons of the binary outcome.

Just like we saw in simple logistic.

And then for time-to-event outcomes, the left hand side is log of

the hazard rate or incidence rate for the time-to-event outcome.

So the right hand side the thing that looks the same.regaurdles of

the regression types with that slight switch out for

Cox regression on the intercept includes our predictors of

interest generically I'll call them x1, x2 up to xP.

And these can be binary, categorical or continuous.

So the thing that we're going to see here is, we're going to extend the definition

of the comparison we make with multiple regression a little more,

make it a little more specific than when we had simple regression.

Generically speaking,

each slope estimates the difference in the left hand side of the equation.

For a one unit difference in the corresponding x.

That's the same thing we talked about in simple linear regression.

The key here is that this comparison, when we have more than one predictor,

is adjusted for the other x variables or other predictors in the model.

So the associations we get between an outcome and

single predictors via the slopes.

Will automatically have been adjusted for the other variables in the model.

So this gives us a nice framework for

looking adjusted associations in the presence of potential confounders.

The intercept estimates whatever's on the left hand side, whether it be a mean for

continuous data, a log odds for binary data.

Or a log incidents rate, or

hazard rate for timed event data when all of our x's are zero.

So let's just look at a generic interpretation example to start.

Suppose we wish to estimate a multiple regression with three x's we've

done a study of intravenous drug users.

From four cities so this is sort of an international study with

four different cities, including Baltimore.

We have London to represent Europe.

We have Delhi in India, and then we have Capetown in South Africa.

So we want to look at some outcome,

and, and see how it's related to three predictors at once.

One is the sex of the person.

The second predictor, which requires three x's because it has four levels and

is nominal categorical, is city.

So I'm going to make Baltimore the reference group and

I'm going to create indicators for each of the other three cities.

So x2 will indicate London, x3 will indicate Delhi.

X4 in Capetown, and x5 is the age of

the IVDU intravenous drug user in the sample in years.

So the general rule to model we estimate for any type of outcome would,

depending on the outcome, we'd have this left-hand side.

Again, this could be a mean, a log odds or log incidence rate.

And they'll be equal to some intercept plus some slope times X and

then our second predictor, we only have three predictors but we have five Xs,

and the way I think of that is our second predictor is city and that requires 3 Xs

because there's 4 cities, 4 categories and then our third predictor is a.

So just for example,

beta one hat here compares the difference in the value of the left-hand side.

Remember sex here is coded as one for females, zero for males.

So it's the difference for a one unit difference in our x.

The only difference, higher to lower, is one to zero.

Or females to males.

So the slope for sex is the difference in the estimated value of the left-hand side

for females compared to males, adjusted for city distribution differences

between the sexes and age distribution differences between the sexes.

So in other words, this is.

The difference in the value of the left hand side for

females compared to males of the same city and age.

We've removed any vari, variability in those between the sex groups.

Beta 5, for example, is the difference in the value of the left hand side for

subjects who differ by one year in age, adjusted for sex and city.

So of the same sex and from the same city.

So this compares whatever we have on the left hand side for

a one year difference in age adjusted for those two things.

Beta 2 here, remember this is the indicator for

London and a 0 if not would be the difference.

Between London and the reference city, which is Baltimore, after adjusting for

sex differences and age differences between those two sites.

So, just a reminder, the metric on which these slopes that will

compare the left-hand, if the outcome is continuous or

left-hand side, is the mean of some continuous outcome, and

the slopes are adjusted mean differences comparing the groups we just discussed.

If the outcome was a binary y, a one or a zero,

then the slopes are adjusted log odd ratios estimates.

And we can exponentiate em them to get adjusted odds ratio.

And if the outcome is tied to event where we have some binary y.

And which indicates whether an event occurred or censored and

then there was the time to go with it.

Then the law of left hand side is the log hazard of having an event

at a given time and the slopes are the adjusted log hazard ratio estimates

If we want to get confidence intervals for our slopes and intercept,

if our intercept has relevance to the population from which the sample comes.

If it's not a placeholder quantity and

we'd be interested in using it to actually quantify some aspect of our population.

we, we can get that by the intercept estimate plus or minus two

standard errors of the intercept, which will be given to us by the computer.

And for any slope, if we have the estimate and the standard error,

we can get the confidence interval in the same manner.

And this will generally be done by a regression package in a computer, but

it's exactly the same idea as almost all other inferences we've done.

What about a general approach to hypothesis testing?

Well, we'll think of this in the context of slopes, and the high hypothesis for

any single slope in the model.

Of course, my hypothesis is the null is that the slope of

the population level is equal to zero.

There is no association between the outcome and

this predictor Xi after adjusting for the other predictors in the model.

The alternative hypothesis is that the population level association is not zero.

In other words there is an association after adjusting.

So the general concept of the null hypothesis is after accounting for

the information in the other predictors in the model, this particular x,

xi is not associated with the outcome.

Does not add information about the outcome above and

beyond the other predictors in the population from which the data was taken.

How would we do this hypothesis test?

Well, the same old approach.

The general approach is to compute a distance measure.

Sometimes it's called T, sometimes Z, but it's always the same computation.

Taking our estimated slope, subtracting what we expect it to be under the null,

which is zero, though we really just take the estimate and

divide by the standard errors of our estimate, our estimated standard error,

and that gives us how far in standard errors our estimate is from zero,

what we'd expect it to be under the null and, we can translate that into a P value.

Something I didn't discuss in simple linear regression, because I

just wanted to get the idea off the ground, but something to think about is.

When we have multi-categorical predictors like city, ethnicity, etc,

that require more than one x to uniquely specify each of the level of the category.

If we actually want to ask the, the general question of whether such

a predictor on the whole is associated with the outcome,

we're going to have to test more than one slope at once with one hypothesis test.

So let me just give you an example, in, in this more generic regression model.

Let's assume I have bunch of x's but

like we did before, one of the predictors is city of interest.

And if I want to test whether city or their locational differences,

in the outcome at the population level after adjusting for

the other things in the model.

I can't do it by testing any one slope alone.

For example, if I just test beta 2, all I'm going to answer is whether there's any

differences between London and Baltimore, the reference group.

But I'm not going to actually get in any information about

differences between Delhi, Cape Town, and

either the reference group or differences between them and London.

So if I actually want to formally test whether the predictor city is

statistically associated with the left hand side after adjusting for

other things, the null I really want to test is that all three slopes are zero.

Because if all three slopes are zero, there's no differences between any

of the three cities with the x variables and the reference of Baltimore, but

additionally we've seen we can if I wanted to for example compare Delhi

to London adjusting for other things in the model I could take.

Beta 3 minus beta 2, to get the difference between Delhi and

London, since neither was the reference group, I can still combine my

slopes to get differences between groups that are not the reference.

So if all slopes are zero, then all differences are zero, and

in other words, there's no differences in the outcome between any of the groups,

the cities after adjusting for the other things.

And the alternative is that at least one of these three is not zero.

And we're not going to talk about the details of how to do this.

But we will see output that has a P value doing this sort of test for

multicategorical predictors, and I'll explain it in the context substantively.

So in summary, multiple regression is a general method for

relating an outcome, whether it be continuous, binary, or

time to event to multiple predictors with one model or one method.

And multiple regression models allow both for better outcome predictions by

using more than one predictor at a time and estimation of adjusted associations.

In the next sets, we'll look at specific examples.

Where our outcome is continuous and we're using multiple linear regression.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.