A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 2: Regression Methods

81 个评分

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

从本节课中

Module 2A: Confounding and Effect Modification (Interaction)

This module, along with module 2B introduces two key concepts in statistics/epidemiology, confounding and effect modification. A relation between an outcome and exposure of interested can be confounded if a another variable (or variables) is associated with both the outcome and the exposure. In such cases the crude outcome/exposure associate may over or under-estimate the association of interest. Confounding is an ever-present threat in non-randomized studies, but results of interest can be adjusted for potential confounders.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Welcome back.

In this very short section, we're just going to give a little bit of insight as

to how adjusted estimates come about, the general idea behind the computations.

What we'll see, very shortly, is that multiple regression methods provide

a nice frame work for doing adjustments quickly and easily.

So, hopefully, upon completion of this short section you'll gain some insight,

conceptually, as to how adjusted estimates are computed.

So, let's first look at our fictitious study.

You'll recall that this was the fictitious study on a random sample from a population

of persons that were males, in a population of male and female adults.

And, there were 210 smokers and 240 non-smokers in this study sample.

And, the crude association between smoking, and

this not so rare disease outcome, again this is fictitious.

Was such that it was close to 1 relative risk was close to 1 but

in the sample smokers had a slightly lower risk of the disease than non smokers.

Then, when we actually broke things out specifically, we have sex,

we looked back behind the scenes, and

we had that sex was related to both the probability of having the disease.

Females were more likely to have the disease than males.

But, males were more likely to smoke than females.

So, sex was related to the outcome of disease in the predictor of smoking.

When we removed the variation in sex, between smoking and

non-smoking groups, i.e, we looked at the sex groups separately.

The relative risk among males of having the disease for

smokers compared to non-smokers was one point eight.

And, for females it was 1.5.

Both estimates are greater than one.

And, again, we're not considering statistical significance at this moment,

just using the estimates to illustrate the point.

So, how would we adjust for confounding?

Well, we, what we did here when z was categorical,

our potential confounder z of sex was categoricals.

We at, looked at the association between our outcome of disease and our predictor

of smoking separately by levels of that potential categorical confounder, sex.

So, our example of separate tables for males and

females is an example stratifying by a potential confounder.

And, what we could, we do, well we saw that the estimates,

the estimated relative risks were both greater than one by differing degrees, but

the difference in the estimates could be because of sampling variability.

Again, we're not at this point considering statistical significance for

this particular section, just talking about the overall concept.

Well, what could we do to take those sex specific estimates, and

aggregate them into one overall association between disease and

smoking that had been adjusted for sex.

Well, one way to do this would be to take a weighted average of

these stratum specific estimates, these sex specific estimates.

So, for example, to get a sex adjusted relative risk for

the smoking disease relationship, We could weight the sex

specific relative risk, for example, by the number of males and females.

And we could take a weighted average by taking the number of

males times the relative risk estimate for males plus the number of females times

the relative risk estimate for females, and divide it by the total sample size.

So, in this example there are 200 males, and

the relative risk of disease for smokers to non-smokers is 1.8.

There were 250 females, and the relative risk of disease for smokers to non-smokers

was 1.5, and the weighted average using that weighing scheme is 1.6.

So, this would be what we might call our sex adjusted relative risk of

disease and smoking.

There are better ways to do this, to take such a weighted average.

Instead of weighting by the sample size, we might be weight by the standard error

of the relative risk estimates, or the log relative risk estimates, and

do the weighted average on the log scale, and then exponentiate the results.

Bu,t this just illustrates the idea of stratifying by the potential confounder,

estimating the stata, stratam specific estimates of the outcome exposure

association, and then taking away the average across the strata.

We could also compute confidence intervals for these adjusted measures, but

we're going to save that until we get very shortly to multiple regression.

In this case, our outcome of disease was binary.

So, we could do a multiple logistic regression to relate the binary

outcome to smoking.

And, we'll see that this multiple logistic can be used to adjust that association for

other predictors.

And, this will be a very useful tool for performing adjustments, so

that we don't have to do this stratifying, averaging approach.

We've looked at the relationship between arm circumference and

height in the sample Nepalese children, less than a year old, and

we found that behind the scenes, not surprisingly, weight was

related to both the outcome of arm circumference and the predictor of height.

So, how could we go about adjusting this?

Well, weight is a little trickier as a potential confounder,

because it's measured on a continuum.

And, the adjusted results we presented in a previous lecture set were adjusted for

weight as a continuous variable.

But, here's the idea.

We could, behind the scenes, look at the relationship.

It's as if we were looking at the relationship between arm circumference and

height, for very tight weight ranges.

So, this is, you know, weight equal, or between 10 to 11 kilograms.

And, we do the same thing for arm circumference and height.

So, this is just trying to explain it conceptually for

the next weight group, between 11 and 12 kilograms, et cetera.

And, we could keep doing this for

small ranges of weight across the entire range of the sample.

And, what we'd get is we could estimate separate associations of the relationship

between arm circumference, get separate regression

lines of arm circumference and height for each of these small weight strata.

And, then what the algorithm for

presenting an over all weight adjusted association,

weight adjusted association, association between,

Arm circumference and height would involve taking the estimated regression slopes for

the regressions of arm circumference and height on each of these weight groups,

so I'll call that Beta One, weight one, Beta One, weight two.

And, we'd have multiple weight groups.

And, we can take a weighted average of these regression slopes to

get an overall adjusted regression slope of arm circumference and

height, after adjusting for weight.

This is just the idea behind the process.

This would not be feasible to do by hand.

And, this is where multiple regression, again, will be our, our saving grace,

because it will do this effortlessly and easily by the computer.

So, in summary, the adjusted association between an outcome y and a predictor x,

adjusted for a a single potential cofounder Z, can estimate, be estimated by

stratifying on Z which is actually hard to operationalize if Z is continuous.

When Z is binary, like sex, or

multi-categorical, the stratum are well defined.

But, if Z is continuous, we couldn't do this easily by hand,

unless we designated small ranges to enumerate the strata.

Then when, within each strata of Z,

we would estimate the Y X relationship, in whatever metric we were using,

whether it be a relative risk or a linear regression slope, etc.

>> and then we could take some sort of weighted average of all the Z

strata level specific Y/X associations.

So, we, across all stratum take our measure of association, and

average those across all the strata based on either the sample size of each strata,

or the standard error of the estimate in each strata, etc.

But, some weighting process that would give more weight to

those estimates informed by more information, or more precise information.

This idea can be generalized, estimating the adjusted association between Y and X,

adjusted for multiple potential confounders, Z1,

Z2...up to Z, however many potential confounders we have,

but obviously, that would be nearly impossible to do by hand.

Breaking our data up into groups stratified on multiple potential

confounders by all possible combinations of these multiple confounder values.

So this is where multiple regression methods are going to

make the adjustment process easy and straightforward.

But, at their core, this is essentially what they're doing behind the scenes with

some assumptions built in.

They're separating the data out into different strata based on the adjustment

variables, and then estimating the outcome exposure association, and,

then averaging across all those levels.

And, multiple regression can estimate multiple adjustment associations in

the context of one model.

So, when we'll see shortly when we expand what we

did in the first three lectures one of the natural ways to

interpret the results will be in terms of adjusted estimates