A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

来自 Johns Hopkins University 的课程

Statistical Reasoning for Public Health 2: Regression Methods

81 个评分

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

从本节课中

Module 2B: Effect Modification (Interaction

Effect modification (Interaction), unlike confounding, is a phenomenon of "nature" and cannot be controlled by study design choice. However, it can be investigated in a manner similar to that of confounding. This set of lectures will define and give examples of effect modification, and compare and contrast it with confounding.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this section we'll continue our discussion of effect modification and

we'll look at several examples of studies where one of

the researcher questions involved was investigating effect modification.

So this lecture section will give more examples of effect modification and/or

the processes used where necessary to investigate effect modification.

So let's look at this first article from the American Journal of Epidemiology.

And the title of the article gives some hint as to where it

was going in terms of effect modification.

It says, Similar Relation of Age and

Height to Lung Function Among Whites, African Americans, and Hispanics.

So notice the talk about in this case.

If you read the article more thoroughly you'll see their outcomes of

interest have more to deal with lung function.

And some of their predictors of interest include age and height.

And what they conclude based on this article that,

is that the relationship between lung functioning variables and age and

height is statistically equivalent among Whites, African Americans, and Hispanics.

So that they can estimate one overall association that applies to

all three ethnic groups as opposed to there being effect modification,

which would necessitate separate measures of the associations between lung function,

age, and height.

For each of the three eth, ethnicities.

So lets look at what they say in the abstract.

They say, current guidelines recommend separate spirometry reference

equations for whites, African Americans, and

Mexican Americans, but the justification for this recommendation is controversial.

So what the authors were curious to see is if data they

collected supported this process of having separate estimate

reference equations relating lung functioning to age and height, etc.

For the separate ethnic groups.

In other words,

they are investigating whether there is indeed effect modification.

So what they say is the authors examine the statistical justification for race and

ethnic specific reference equations in adults in both

the Third National Health and Nutrition Examination Survey and

the Multi-Ethnic Study of Atherosclerosis Lung Study.

Spirometry was measuring followed,

measured following American Thoracic Society guidelines.

And then they go on to describe what they mean by statistical justification.

Statistical justification, and, for estimating separate associations was

defined as the presence of effect modification by race or

ethnicity among never smoking participants without respiratory disease or symptoms.

So they go on to say in the abstract, there was actually no evidence of

effect modification by race, ethnicity for forced expiratory volume in one second.

Forced vital capacity or the forced expiratory volume in one second

ratio compared to forced vital capacity in the three different ethnic groups.

So, they went on to do an analysis and use the statistical techniques to,

to test whether or not the relationship between these respiratory outcomes and

predictors such as age and height were statistically different among the three

ethnicity groups, and they found no evidence of a difference.

We'll see how to do such tasks shortly in the upcoming sections on

multiple regression.

But what they're concluding here is based on this updated data this more

modern data, there was no evidence to support the previously thought notion

that the relationship or the reference equations that related lung functioning,

the characteristics in such as the age and height needed to be different.

And there were associations for

white, Africa-American and Mexican-American men or women.

They did go on to say though that the mean lung function for a given age, gender and

height was the same for whites and

Mexican Americans, but was lower for African Americans.

So they did conclude that there were some overall differences between

the ethnicities after adjusting for age, gender, and

height, but then ultimately that the relationship between lung function and

age, gender, and height did not depend on what ethnicity the person is.

And this second article from the New Jer, England Journal Medicine their looking at

statins to prevent vascular vents in men and

women with elevated c-reactive protein.

So this is a randomized study where the researchers randomized 17,800 healthy,

in other words those without a history of cardiovascular disease, men and

women with non-elevated LDL cholesterol levels to

receive either 20 milligrams of statins daily, or a placebo.

And the subjects were followed for up to five years.

At the end of the follow-up period, the study results include the following.

Of the 8900 subjects randomized to the statins groups,

a 142 developed cardiovascular disease.

And of the 8900 subjects who were randomized to the placebo group,

251 developed cardiovascular disease.

So the unadjusted incidence rate ratio here is very similar to

the straight comparison proportions, but the incidence rate ratio accounts for

potentially differing follow up periods for

each individual is the incidence rate ratio is 0.56 indicating that.

And this is unadjusted.

This compares the incidence rate cardiovascular disease development for

those who are randomized to receive statins to those who are randomized to

receive a placebo.

And this estimates a 44% reduced instance or risk of developing cardiovascular

disease in the follow up period in the statins group relative to the placebo.

And this result is statistically significant and

the 95% confidence interval goes from 0.46 to 0.69.

Now the authors did not go ahead and

report adjusted relative risk or incidence rate ratio.

Despite other characteristics that may be associated with

cardiovascular disease development.

Sex.

Age, and smoking.

But why do you think they didn't have to go ahead and

report an adjusted incidence rate ratio?

Well, the study was large and randomized.

So ostensibly, if they were to report the adjusted incidence rate ratio,

which should be similar,

if not identical, to this other adjusted incidence rate ratio, 0.56.

But because of randomization there was very little potential for any confounding.

However, the authors did investigate interactions or

effect modification between some of these characteristics and stats.

They said, well, the overall result that we just presented is not

confounded by distributional differences between the statin and

the placebo groups in these other measures.

However, it is possible to the association between cardiovascular disease and

statin use differs depending on the level of some of these other characteristics.

So, this is a very common type of table shown in

the results from randomized clinical trials.

Especially where they look at the association of interests separately for

different levels of other variables.

So for example, they go ahead and show the estimated instance rate

ratio of mortality for those on statins compared to those on

placebo among males only and they give the estimated as a ratio here.

Now that's confidence interval and females only.

And they give the estimated haz ratio here and it's confidence interval.

This vertical line here, this dotted vertical line is

the 0.56 that they estimated for the overall association.

And this solid line here is one, which would be the null value.

So we can see very quickly that the association is statistically significant

for both males and females, as neither confidence interval includes one, but

you can see the estimates are relatively close to one another, and

the confidence intervals overlap.

So this suggests strongly that the relationship between

cardiovascular disease development statins does not differ between males and females.

In other words the association is not modified by sex.

They do report something here that we haven't explored yet, but

we will when we get into multiple regression techniques.

There is a way to formally test whether the population level

associations between cardiovascular disease and

statins are statistically different for males and females.

The null is that they are not different and this P-value is quite high,

indicating the we would fail to reject a null, which is consistent with

the fact that the estimates were similar and the confidence intervals overlap.

They went in to do this type of analysis stratifying by age.

They wanted to see if there was a difference in the association for

younger people as defined by those less than equal 65 years and

those greater to equal 65 years.

And the estimates are different, as you can see, but the confidence intervals

overlap and the interaction was not statistically significant.

And they do this for several other characteristics.

So what they ultimately report, the ultimately did not find any evidence of

effect modification, even though they investigated it.

So they go on to actually report the results like this.

They say the rates of the primary end point were 0.77 and

1.36 per 100 person-years of follow-up in the statins and

placebo group, respectively, with a hazard ratio for statins of 0.56.

In a 95% confidence interval 0.46 to 0.69,

this is what we showed when we started talking about this.

And a very small p-value.

However, they go on to say consistent effects were observed in

all subgroups evaluated.

So, what they're saying, in other words, is there was an overall association and

it did not, in their investigation,

appear to differ for different subgroups of the population.

So the message that they are giving is pretty clear that there's an overall

association of reduced cardiovascular risk associated with statin use.

And this relationship does not vary by sex or by age.

Or by any other characteristic they did a subgroup analysis on to look for

effect modification.

So they found no evidence of effect modification by any of

the factors that they examined in their study.

Here's another study.

Plasma Enterolignan Concentrations in

Colorectal Cancer Risk in a Nested Case-Control Study.

'Kay. So this is a nested case-control study.

We haven't looked at many case-control studies but

we can still appreciate the associations.

So enterolignans and biphenolic compounds that

possess several biologic activities whereby they may influence carcinogenesis.

The authors investigated the association between plasma entero lignan and

enterodiol and colorectal cancer risk in a Dutch prospective study.

Among more than 35,000 participants age 20-59 years

160 colorectal cancer cases were diagnosed after seven point years of follow up.

So they used this as their starting point, these 160 cases.

And they matched members in the cohort on frequency matching to the cases on age,

sex, and study center.

So they selected about double, two and a half times the number of controls.

Frequency match.

Not one to one matching.

But they took a, a control group that had similar characteristics in terms of

the age, sex, and sex distribution, and study center distribution as the cases.

Okay, so they actually show that plasma, enterodiol and enterolactone were

not associated with the risk of colorectal cancer after adjustment for

known colorectal cancer risk factors.

And so they estimated odds ratio comparing the highest quartile versus

the lowest quartile.

So they categorized the enterodiol levels into four quartiles.

The odds ratio is 1.11, and the results are not statistically significant.

And similarly they did this thing for the enterolactone quartiles.

And while they showed in the sample an elevated odds

of colorectal cancer in the highest quartile to the lowest, the results were

not statistically significant as the confidence interval includes one.

However, they go on to say, sex and body mass index

modified the relationship between plasma enterolactone and colorectal cancer risk.

Increased risks were observed among women and subjects with high body mass index.

So what they're saying is on the whole there was an association after accounting

for the sampling variability.

But in certain subgroups they found that there was an association of increased risk

associated with increased levels of these, of the plasma enterolactone.

And this was found among women, but necessarily among men.

And among those with a high body mass index, but

the association didn't hold up for other body mass index.

So what they're saying is that the effect of ent-,

enterolactones on their risk of colorectal cancer as

measured by the odds ratio was modified by these characteristics.

A different association existed for women than men, for example.

And let's look at one more example,

the association of race with age among survival patients undergoing dialysis.

And we looked at this several times with statistical reasoning one, but

we'll come back to it.

Now I'll just give you that context.

It says from the abstract here, many studies have reported that black

individuals undergoing dialysis survive longer than those who are white.

This observation is paradoxical given racial sparities in

access to inequality in care.

And is inconsistent with observed lower survival among black patients with

chronic kidney disease.

And one of the things they hypothesized was that age modified survival differences

by race.

So this goes on to talk about the study design this is just to

say they pulled a large number of medical records from the Center for Medicare and

Medicaid services forms.

And then coupled it with data from the United States Renal Data System.

And what they did in order to replicate previous studies was look at a Cox

proportional hazard model to estimate the association between mortality and

race and they actually went on to adjust for

a bunch of other characteristics that may differ.

Between the black and white subjects receiving dialysis including age, sex and

insurance type.

And we'll see how to adjust with multiple cox regression in our

section on multiple regression.

But they did this first for everyone.

And then they went on to actually look at the association.

Between mortality and

race adjusted for these other characteristics but separately by age.

And so they said to confirm whether the differences between age groups were

statistically significant an additional model was built.

And including interactions terms for each category and black race.

And again, we'll get how to do this in the multiple regression section.

But essentially what they said is they used an approach that allowed them to

estimate the separate relationship between mortality and

race, adjusted for other characteristics, for separate age groups.

To see if the association between mortality and

race differed by age of the subjects.

And so we've looked at this picture before.

But this is a close up of table 2.

As close as I can get it.

And these are the results from resulting analysis.

What they are presenting here is the adjusted relative hazard of mortality for

black patients to white patients in each of these age groups.

And so they adjust for

a bunch of other characteristics in each analysis that may differ between black and

white patients who are going on dialysis and may affect mortality.

But what you see here is that in the early ages, 18 to 30,

this dot here is the estimated hazard ratio and this is the confidence interval.

And this is actually all scaled, you may remember on the long scale.

So these intervals are symmetric and comparable in terms of the risk.

But what we see here is that younger ages the instance rate ratio.

The mortality for black to white patients is above one and

statistically significant.

So black patients on, on dialysis have a higher risk of

mortality compared to white patients in the 18 to 30-year-old age group.

And this goes, decreases, but persists to be higher and

statistically significant for blacks compared to

white patients who are 31 to 40 years old when receiving dialysis.

But after that age group, the trend goes the other direction,

the relationship between mortality and race changes the other direction.

After age 41 blacks consistently have a lower risk of

mortality compared to white patients after adjustment for other differences.

And so what the authors are showing here effectively is that

there's effect modification by age.

That the relationship between mortality and race, as defined by black or

white in dialysis patients is modified by age.

So in summary, we'll look at a few more examples here.

Effect modification occurs when the relationship between two quantities,

Y and X, depends on the level of a third quantity, Z.

And effect modification cannot be ascertained by comparing unadjusted or

crude associations and adjusted estimates adjusted fo Z.

We actually need to see separate estimates of the Y/X relationship for separate

levels of Z in order to ascertain whether that association is different or not.

We will show very shortly how to set this up in a regression context and

how to formally test whether at least some of the associations are different for

some of the levels of Z.

But, the fundamental idea holds that in order to investigate effect

modification the researcher has to consider doing so in advance and.

Allow for the estimation of the relationship of interest separately for

different levels of the potential effect modifier.