A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 2: Regression Methods

81 个评分

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

从本节课中

Module 4: Additional Topics in Regression

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So, in this section, we'll talk briefly about another use of propensity scores,

above and beyond standard adjustment.

By including the propensity score as a predictor in a multiple regression,

it also includes the predictor of interest.

We'll talk about a method called propensity score matching that

is sometimes used.

So hopefully, this lecture will give you a basic overview of

propensity score matching, its purpose, and situations in which it may be a better

alternative to a tris, traditional adjustment with multiple regression.

Or even with propensity scores the straightforward adjustment way like we

saw in lectures ten A and ten B.

So, again, the reason we have to adjust and, and the reason propensity scores can

be helpful is that in some non-randomized studies there is a very

specific outcome predictor association of interest, but because of the study design,

the observational nature, confounding is a threat.

So in seap, such situations, there may be other potential predictors, but

the research interest in other predictors is in, in terms of using them for

adjustment only.

The research is not concerned about the adjusted associations between

the outcomes and these predictors after adjusting, if you will, for

the main potential predictor of interest.

These are only used to better, get a better comparison,

a more comparable comparison of the outcome between comparable or

equivalent exposed and unexposed groups on the main predictor of interest.

However, in some such studies the exposed and unexposed groups.

As determined by the primary predictor levels are very different with regards to

their values, or distributions,

of potential confounders, there is only a subset of the unexposed, perhaps, that

have similar confounding distributions as those in the exposed groups.

In this scenario it may make sense to restrict the comparison of

an outcome between the exposed and exposed groups to this subset.

Let me show you an example where the distribution of propensity scores between

the unexposed and exposed leaves some gaps for

both groups but in this example, mostly for the unexposed,

you can see that there's a whole portion of the propensity score distribution.

Dwells where there's no crossover with the values among the exposed.

And there's also a little bit of the distribution here in

the exposed group that shares no values with the unexposed.

In this situation if we were to adjust either by the traditional approach.

Multiple regression of an outcome on the primary predictor of interest and

then including each potential confounder as separate predictors in

a large multiple regression model or

if we were going to do what we advocated doing in ten a and ten b.

Where we do a multiple regression of an outcome on the primary predictor of

interest and then include our only adjustment variable as the propensity

scores, either in quartiles or quintiles or however we choose to do that.

But when we have such differing distributions of confounders for some

subgroups of each group, this approach may estimate an artificial comparison.

Those exposed to the unexposed who are the same on all other variables.

All other variables mean either those used in the traditional multiple regression as

adjustment variables or those used to create the propensity scores.

So ostensibly when adjusting with the propensity score you're adjusting for

those variables that were used to create it.

So what do I mean by artificial comparison,

those exposed to unexposed were the same on all other variables.

Well, in this type of situation, there are at least some confounders where there do

not exist exposed and unexposed subjects with the same values.

Or there are some members of each group that do not share values with some,

members of the other group.

So again, in this example, it's more explicit here with the unexposed.

But you can see there's a hole.

Group more than 25% of the unexposed group, that has distributions

lower than the lowest observed propensity score value in the exposed group.

This may also cause analytic problems.

Because if you think about adjustment,

you think the process of adjustment is essentially breaking.

You know behind the scenes if we do a multiple regression where we adjust for

propensity scores it behind the scenes breaks them up into groups based on

their propensity scores.

Especially if we're adjusting by quartiles or quintiles.

And within each of the groups the propensity score groups it estimates the,

the outcome exposure relationship.

And then it averages those across the different propensity score groups.

Well, if there's some propensity score groups where there's no way to

estimate the outcome exposure relationship because there's no exposed members in that

grouping where there's very few relatively unexposed.

Then we may actually inflate the uncertainty of the estimate because we're

averaging in quantities that have very little precision to them

based on the imbalance of the observations with that propensity score,

between the two groups we hoped to compare.

So, the idea of propensity score matching is that for

each observation in the exposed group, match one with one or more

observations in the unexposed group, who have the most similar propensity scores.

And then compare the outcome between the exposed observations and

this subset of matched unexposed observations.

So what's generally done is something like this.

You'll see it in the examples I give you.

And it can be very complicated, depending on the relative number of persons in

the unexposed and exposed groups.

And thinking about what to do when there's more exposed members

than unexposed in a certain area of propensity score values etc.

But the basic idea is this.

What the researchers do is they take the propensity score distributions,

break it up into groups like we were talking about and then

within those groups they match everybody who has that value in the exposed group

to somebody in the unexposed group with the most similar value.

So this is called propensity score caliper matching, where they, the calipers

are the groupings if they could be quintiles, or propensity scores, etc.

The difficulties come, on how to approach this when there's,

in some groupings, or calipers, there's fewer unexposed persons than exposed,

then you have to go with plan B and

sometimes match multiple exposed persons to the same unexposed person.

And then handle the fact that you've duplicated a match in

the analysis later on.

But the general idea is that you break your, and prevents

the score distribution within the groups and do the matching within those groups.

So let's just look at two examples of where this is used.

And both these articles, actually if you look them up, if you go to Welch and

search them out, they spend a large amount of time talking about how they match.

Because like I said,

it can be complicated depending on what happens to the number of unexposed versus

exposed in the different subgroups of propensity score distributions.

Now let's look at this first one, abs, abstinence pledges and

subsequent sexual activity from an article published in Pediatrics in 2009.

And so in the abstract they say the objective is the US

government spends more than $200 million annually on abstinence promotion proble,

programs, including virginity pledges.

This study compares the sexual activity of adolescent virginity pledgers with

matched nonpledgers by using more robust methods than past research.

And they talk about how they recruited the subjects here.

This was based on the National Longitudinal Study of

Adolescent Health respondents.

A nationally representative sample of middle and high school students who,

when surveyed in 1995, had never had sex or

taken a virginity pledge, and who were greater than 15 years of age.

Adolescents who reported taking their virginity pledge on the 1996 survey were

matched with non-pledgers.

>> About two and a half times as many, and 289 of those who took a pledge and

645 who do not, by using exact and nearest neighbor

matching within propensity score calipers, that was just what I was referring to.

Breaking the propensity score distribution up into bins, essentially.

Matching within propensity score calipers on factors including

pre-pledge religiosity and attitudes toward sex and birth control.

Pledgers and matched nonpledgers were compared five years after the pledge on

self reported sexual behaviors and positive test results for chlamydia,

gonorrhea and, Trich, Trichomonas Vaginalis, and safe sex outside of

marriage by use of birth control and condoms in the past year and at last sex.

So the results they found; five years after the pledge, 82% of the pledgers dis,

actually denied having ever pledged.

That's sort of an interesting finding unto its self.

Pledgers and matched non-pledgers did not differ in pre-marital sex,

sexually transmitted diseases and anal and oral sex variables.

Pledgers had 0.1 fewer past years partners, but

did not differ in the lifetime sexual partners and the age of first sex.

Fewer pledgers than matched nonpledgers used birth control and

condoms in the past year and birth control at last sex.

And so the, a researcher concludes that the sexual behavior of virginity pledgers

does not differ from that of closely matched nonpledgers, and pledgers are less

likely to protect themselves from pregnancy and disease prior to marriage.

Virginity pledges may not affect sexual behavior, but

they may decrease the likelihood of taking precautions during sex.

Clinicians should provide birth control information to all adolescents,

especially virginity pledgers.

So let me just show you what they talked about.

because this is a little bit of insight into what they did.

But the article is really detailed about how they did the matching.

Matched sampling is a nonparametric method for assessing program outcomes by

comparing a program group within similar nonprogram respondents.

We created a group of nonpledgers as similar as possible to

pledgers on all prepledged factors that may influence sexual behavior.

So, outcome differences between pledgers and

matched on pledgers cannot be attributed to pre-existing differences.

So they're trying to adjust for these.

Past studies compared self selected virginity pledgers with

the general population of non-pledges and attempted to ju, adjust for

the vast pre-pre, pledged differences by using traditional regression models.

The ones I was talking about, where what virginity pledge yes or

no was the primary predictor of interest.

Then all other potential control variables were entered individually into the model.

Both matching and regression yield associative rather than causal inference.

But matching creates more valid comparisons in results for three reasons.

First, regression models rely on dubious parametrics assumptions, which actually,

you know, that's, it's true that they have

structural resumptions that some of what should be talked about in regression.

But the fact is, quite frankly, because propensity scores are based on

regression estimation, there's some of that in this process as well.

So this doesn't let the researcher off the hook for that.

But regression on the whole cannot adjust, even on average, for

large differences between program and non-program groups.

So that's what I was talking about.

If there's certain subsets of persons in each of the groups where

there's no similar persons in the other group in terms of the propensity scores,

regression can have trouble estimating the difference between the exposed and

unexposed groups after adjustment with precision.

Second, matching computes outcome differences only once after

verification that the matched non program group is similar to the program group.

This separation assures that the model is selected independently of

the study res, results.

In contrast to regression with which it's impossible to

verify model correctness without seeing results.

And this gets it.

Some of the ideas of models selection.

We were talking about before.

If a researcher was looking at multiple repressions and

adjusting for various combinations that confounders.

He or she may iterate until they find the subset of confounders that

are statistically significant.

So they're actually trying to valid their results by seeing the results.

And here what they're saying is we first set up the matching algorithm without

estimating the outcome exposure relationship.

And once we're good with the matching algorithm we then go ahead and

do the adjustment.

Third, matching, and this is a thing about propensity scores as well, allows for

many more variables for adjustment than does straight up

regression where you include each of the variables as individual predictors.

In this study I controlled for

112 variables which would be problematic in a regression with 289 pledgers.

So they took the information 112 variables and

reduced it to a single number through the propensity score.

So for these reasons, match sampling has been advocated for studies in medicine,

public health, and is used increasingly often in the medical literature for

situations, perhaps, like this, where there's a very

different confounder distribution between the groups being compared.

And I'm going to read to you a little bit more, sorry, I know you can read, but

I just, just think they do a nice job of.

Of reinforcing one of the ideas we said up here, so in an ordinary regression,

virginity pledges would be compared with all non pledgers, but

these groups differed one year before taking the pledge.

Comparing the 289 pledgers in all 31, 51 non pledgers,

that wave one, but for matching, pledgers were less sexually experienced and

expected more negative and pure positive.

Psychosocial effects of sex and

birth control use with lower birth control efficacy and knowledge.

Pledgers had greater levels of religious belief,

involvement, born again affiliation, more religious parents, and fewer

substance-using friends, and were more likely to expect marriage before age 25.

They reduced proportionally female, Asian with for, with foreign born parents and

had lower peabody ver, vocabulary scores.

And in this next poem it's not contiguous with the first.

To say turning to outcomes five years after the pledge 81.9% of virg,

virginity pledgers claimed to have never pledged.

Virginity pledgers and matched non-pledgers did not differ in 12 of 14

sexual behaviors, 3 of 3 STD test results and 4 of 4 marriage related outcomes.

Pledgers report an average of 1.09 past year vaginal sex partners.

0.11 fewer than non-pledgers and

2.31% fewer pledgers reported having been paid for sex than non pledgers.

Unmarried pledgers were less likely to report using birth control and

condoms in the last year and birth control at last sex but

did not differ in reporting condom use at last sex or in condom breakage.

So here's the table where they actually do the comparison.

So.

The idea is what we're seeing here is the adjusted results because these

are among the matched samples.

And what they're measuring in this column is the difference and the 95%

confidence interval for the difference either in the mean or proportion.

And then they call this the t test,

but as we know for comparing proportions we wouldn't call a t test.

But they're measuring here is something we're pretty familiar with, which is

the standardized distance between the two groups divided by their standard error.

So the idea is, if this is greater than 2 in absolute value,

the result is significant with a P value of less than 0.05.

So for example, if we look at sexual intercourse,

that's 72.66 of the pledgers engaged in sexual intercourse

by the time of the follow up study versus 76.24% of the non-pledgers.

So the pledgers had a lower proportion on the order of 3.6%, but

it was not statistically significant.

And similarly, there's other measures here, age at first sex.

Those who pledged were about a half a year older on average

age at first sex as compared to those who didn't pledge, the average is 20.7 years.

But this result was not statistically significant.

And the two things they highlighted as being statistically significant were

the number of past year partners was lower by about 0.11 on average and

was statistically significant.

Had a effect size or a distance measure of negative 2.45 and having ever been paid

for sex which was statistically significantly lower in the pledging group.

Let's look at one more example where they use propensity score matching.

Drug use and intimate partner violence.

So the objective, this was done in the American Journal of Public Health,

they say we examined whether frequent drug use increases the likelihood of subsequent

sexual or physical intimate partner violence, and whether intimate partner

violence increases the likelihood of frequent subsequent drug use.

They used a random sample of 416 women on methadone, and

they were assessed at three points in the study.

Baseline called wave 1, 6months called wave 2, and 12 months at wave 3.

Propensity score matching in multiple logistic regression were employed.

They found women who reported frequent crack use at the second stu,

study period wave two were more likely than non drug using women to

report intimate partner violence at wave three.

An odds ratio 4.4.

That was statistically significant.

And frequent marijuana users of wave two were more likely than non-drug users to

report intimate partner violence at wave three as well, with an odds ratio of 4.5.

In addition, women who reported IPV, intimate partner violence, in wave two,

were more likely than women who did not report intimate partner violence to

indicate frequent heroin use in wave three.

An odds ratio of 2.7.

So let's just look a little bit about how they sampled and matched.

They randomly selected 753 women from the total population of 1700

women enrolled in 14 methadone maintenance clinics in New York City.

And of the 753 women 559 agreed to participate.

And they go on to describe this.

Ultimately they ended up with 416 women who were eligible, and

agreed to participate and completed a baseline [INAUDIBLE] criteria.

And their eligibility criteria were, being a female between the ages of 18 and 55.

Being enrolled in the Methodist, methadone maintenance program for

at least three months and during the past year having had a sexual or

date, dating relationship with someone described as a boyfriend, girlfriend,

spouse, regular sexual partner or father of her children.

We use propensity score matching to reduce the selection bias that can occur in

an observational study.

This heuristic nonparametric technique in effect reconstructs a sample that mimics

the results of a random sample component in a randomized clinical trial.

By selecting groups that have similar values to observe confounders and

that only differ with respect to a treatment variable of interest.

Propensity score matching can eliminate this bias if we

were able to balance across the treatment and control groups all the covariates that

are associated with both the treatment and outcome.

Propensity scores were calculating using the attributes for

observed confounders measured at wave one.

Treatment variables at wave two, and outcome variables at wave three.

[SOUND] This analysis plan ensures that the confounders temporarily proceed

treatment assignment,

which in turn proceeds the determination of the outcome variable.

And they go on to say the confounders include associated demographics,

history of trauma.

Psychological distress, social support, and HIV risks.

For hypothesis one, the treatment variables frequent drug use measured at

wave two in the outcome variables at, in the net part near violence at

wave three and we saw that from the results section.

And they go on to describe their other hypotheses.

And then they say, after using propensity score matching procedures to select

a final sample of participants who are, for which valid causal effect sizes could

be obtained, we used multiple logistic regression to test each, each hypothesis.

For each type of drug, adjusted odds ratios and

their associated 95% CIs were examined.

To test the hypothesis.

And here's the table they report.

So for hypothesis one frequent drug use

increases the likelihood of subsequent IPV.

And what they're showing, their outcome here is whether or

not the woman reported having experienced intimate

partner violence at the last study follow up wave three.

And they use their reporting of frequent drug use at wave two,

two as the predictor, so women who reported using cocaine at wave, wave two

had 60% higher estimated odds than women who didn't of experiencing intimate

partner violence at wave three but this was not statistically significant.

However, as you can see, and they reported it in the abstract, crack uses and

heroin uses were statistically significantly associated with

a large increase in the relative odds of experiencing intimate partner violence at

the third follow up period.

And then this other hypothesis they look at

their predictor here is whether the woman had experienced a reported intimate

partner violence at wave two with regards to what they used drug wise at wave three.

So, if the outcome was use of cocaine at wave 3,

then women who had actually experienced intimate partner violence at wave 2 had

2.1 times the odds of women who not, of using cocaine frequently at wave 3,

although that was not statistically significant.

And crack again was and heroin were the two things that were

statistically significantly associated with increased uses for women who had

experienced intimate partner part, violence at the previous study period.

So, hopefully this whole lecture set has given you a look at

some alternative methods for adjustment that can be useful.

One thing to keep in mind though is that all of these are,

are mine the same territory as what we did in multiple regression but

may have some advantages analytically especially, but

nothing can actually adjust for confounders that were never measured.

So the utility of these other methods involving propensity scores is limited by

the number of potential confounders that are measured by the researcher just as

the other methods we discussed with multiple regression were as well.