1:36
So one way to handle effect modification in the regression context is to
present stratified results.
So here is an example of a study where the effect modification by sex of
the person was of interest and the results were presented for multiple, simple and
multiple logistic regression models.
Separately estimated only for data from males and data for females.
This was an article that was looking for suicide outcomes.
Both the idea of having suicidal thoughts and attempting suicide as a function of
a person's self sexual identify, whether they identify as homosexual or not.
And the authors look both at the unadjusted associations between these
outcomes and homosexuality and the same association adjusted for
other factors like ethnicity, alcohol abuse, et cetera, but
they did so separately for males and females.
So, a closeup of the first the top of this table shows these
associations estimated for boys only.
And at the bottom of the table the same analysis is presented for females.
3:17
Another example is an article that came out in the New England Journal of
Medicine looking at coffee drinking and mortality.
And the authors, what they did was they looked at time to event data,
time to mortality data in a follow-up period as a function of how much coffee,
on average, people reported drinking at the start of the follow-up period.
And so what they did separately for males and
females to see if there were any differences in the associations,
either directionally or in terms of the magnitude is they
estimated both the essentially unadjusted association, only adjusted for age.
The association between mortality and coffee consumption.
And the estimated hazard ratios of mortality for
the different levels of coffee consumption relative to those who didn't drink coffee.
And then they reestimated these adjusted for a, a multitude of other factors and
here's a list of those factors here.
Body mass index, race or ethnic group, et cetera.
But they did these analyses both the unadjusted or only age adjusted and then
multiple adjustments totally separately for the data on men and again on women.
So they never combine the data between the sex groups,
they did the analyses completely separately.
5:05
So, they don't want to do the analysis completely separately for males only and
females only.
Similarly we might want to estimate age specific associations between
mortality and race in dialysis patients, after adjusting for other factors.
But we want to, want to use the data for all age groups combined in order to
estimate the adjusted associations in one model with these other factors.
Well this can be done by concluding what's called an interaction term in
a multiple regression model.
So let's look at an example here, this is the data set 534.
US workers in 1985 and what I want to look at,
is the association between hourly wages and years of education.
What I am presenting in this table is the unadjusted and adjusted linear
regression slopes for years of education only in models that were adjusted for
various multiple characteristics.
So, what we have here.
Here is this slope of years of education from a simple linear regression model of
wages on years of education.
So the unadjusted association suggests that two groups who differ by one year
in years of education will have hourly wages that differ by 75 cents on average,
7:13
So for example, let's look at the results for model c.
This was the one that estimated the association between hourly wages,
years of education, sex, and union membership.
And in the table I only showed you the results, resulting slope for
years of education, but
there were also resulting slope estimates for sex and union membership.
So what I'm showing here is a graphic that shows what this model's estimating with
regards to the union membership adjusted relationship between hourly wages and
years of education and I'm applying this separately by sex groups.
So you'll notice that these lines are parallel.
The slope of each of these lines is 0.76,
that adjusted slope we reported, adjusted for sex and years in education.
10:46
Here's how it works.
We actually if we read the computer,
the computer I have to actually create it myself.
It doesn't come with the data set and there is no automatic command on
the computer to say create an interaction term.
But it's actually elegant in its simplicity once we get through
the mechanics here are, hopefully you'll appreciate that this is kind of neat.
So the interaction term can be created.
It sounds kind of strange at first but we'll see why it works.
Can be created by taking the product of the two variables we want to interact.
In other words, if we want to estimate whether the relationship between years of
education is modified by sex, we're seeing whether
there's an interaction between years of education and sex.
And so we would create this interaction term.
I'll generically call it x3.
By taking the product of years of education and sex, x1 times x2.
And so let's look at what the new model we're going to estimate is.
It's going to include years of education and sex, just like
the model that adjusted for each other, plus the other adjustment variables.
But we're actually going to also add in this interaction term, x3.
And then any other x's we want to include to either better predict wages or
adjust the relationship between wages and years of education above and beyond sex.
So let's see why this works.
So in this coding schema I've called x1 years of education and
x2 is sex which takes on a value of 1 for
females and zero for males and then x3 is the interaction term.
So what is the value of this interaction term for males?
Well for males this equals the years of education measure
times the sex value for males, x2 is equal to 0 for males.
So this interaction term is nothing, it disappears.
It's equal to 0 for males.
What about for females?
Well for females, x3 our interaction term, is equal to years of education for
x1 times x2, the sex indicator which is a 1 for females.
So for females, x3 is ultimately equal to another copy of years of education.
So, let's see how this all plays out.
So, what I did here was estimated this regression that includes years of
education and sex, and other adjustment variables.
In this case, just union membership, but
I just want to make this more gen, general in its conceptualization.
And I also included the interaction term.
And here are the resulting estimates I got from the computer.
The slope of years of education is 0.7, the slope of sex is negative 3.69,
and the slope of the interaction term is 0.14.
So, let's see how this plays out.
13:42
Well lets look at what, first what this model estimates, the relationship between
wages, years of education, and everything else to be for males, males only.
Males are kind of easy given their coding because their value of x2 is equal to
0 and hence their value of the interaction term is equal to 0.
So when we write out, if we were only looking at males, when we write out what
this estimates for males, we get the intercept of 0.4 plus the slope for
years of education times years of education.
And then both sex and the interaction term disappear because they're both equal to 0.
Plus whatever else we have in this model.
In this case its just union membership but
I'm not showing that slope because I want to focus on this years of education piece.
So in males, in males the slope of years of education here.
The piece that describes the relationship between hourly wages and
years of education is equal to that 0.7.
So for males,
14:47
hourly wages increase by 70 cents on average per additional year of education.
For females we're going to have to do a little more accounting to get this story,
but what we're basically going to see is by generating this
interaction we get to put in another copy of years of education, and
when we combine the two parts that we'll get for years of education we get
a different slope estimate of years of education for the females.
So let's do this out.
So for females we get this, we get the intercept that the males got.
We get everything that the males got.
And then we get the slope of sex times 1, so plus -3.69.
Then we get this interaction term times the years of education variable times 1,
like we said before.
And then plus there's the piece about union membership, but
I'm just leaving that generic.
So if we do a little accounting here we can bring the negative 3.69 over here.
Sorry about this extra negative sign, but, and
then if we order the 0.7 x1 and the 0.14 x1 together.
And then we do a little factoring we see the combine, these two combine to give,
if you will, the slope of years of education among females.
So among females, the average increase in hourly wages per year increase in
years in education is that increase for
males of 0.7 plus this additional piece, plus another 14 cents.
So in total the slope or
association between hourly wages and years of education for females is 0.84.
And this piece, this piece for the interaction term
16:50
Similar to the plot we did before, but now you'll notice that these lines,
these sex specific associations have different slopes.
The slope for males is 0.7.
And the slope for females is larger, 0.84.
So you can see that these lines are starting to
converge with increased years of education.
The other side of this story, and that we could go back and
rewrite the model to estimate this piece as well, but
just conceptually, if we are estimating interaction between these two variables.
Not only are we estimating differing relationships between hourly wages and
years of education by sex, but we're estimating different associations between
hourly wages and sex depending on years of education.
So if we look at two groups who have ten years of education,
males compared to females, this vertical distance here is the average difference
in salaries between males and females with ten years of education.
If we did the same thing for those with 15 years of education,
this vertical distance is the average difference between males and females.
So you can see that the average difference between males and
females also depends or changes depending on years of education.
20:05
And the purposes of our investigation are to either confirm or
rule out effect modification after adjustment for union membership status we
would say we've ruled it out power considerations notwithstanding.
And we would probably go back and report that common adjusted association between
years of education and hourly wages adjusted for sex and union membership.
Let's look at another example though, and
again I'm just trying to give you the basic idea here.
Being able to handle the mechanics of this are not essential for this course, but for
some of you this may be interesting and you may want to apply this.
At some point in your data analysis projects, or
you may go on to do further courses in statistics and this will
give you at least a starting point for the mechanics of interaction and regression.
So let's look at an example of mortality in
patients with primary biliary cirrhosis.
This Mayo Clinic data we've looked at so often before.
And this is a randomized trial for patients randomized to
receive the drug DPCA or Placebo, and the outcome of interest is death.
And the results the unadjusted hazard ratio mortality for
patients receiving DPCA Placebo, was 1.06
a slight increase in the mortality in the sample for those who receive the drug.
But this result was not statistically significant.
And as we've seen if we adjusted for something like age the unadjusted and
adjusted hazard ratios,
DCPA to placebo, are very similar because this was a randomized trial.
However, we still may have a question about age, in knowing if
one could found the overall relationship between DPCA and mortality.
But we might want to ask maybe,
maybe the drug, the affected drug was modified by the age of the patient.
Maybe it doesn't work.
Or is even harmful for some age groups but works well for others.
Well, at this level of analysis,
all we have is one overall association between DPCA and mortality.
That would use to describe the association for all ages.
So if we want to investigate whether there is effect modification by age,
we have to go a little further.
So, I'm going to look at age categorized into quartiles so
that we can do this interaction approach, and there's four quartiles.
The first quartile is persons four, less than 42 years.
Second quartile is those 42 to 49.9 years.
Just less than 50, etc.
You can see what's going on here.
So to investigate whether age modifies the effect of
the drug we will need to fit a Cox model.
That includes drug as a predictor but also the age quartile indicators.
We've got four groups here so we'll need three binary indicators.
And then interaction terms between the drug variable and
each of the age quartile indicators.
So this actually looks a little daunting when it all comes down.
And again, for those of you that are not interested in the mechanics, I just want
you to get the basic idea of what the interaction term or terms allow us to do.
But for those of you interest, I'll detail this a little bit.
So this x1 here is an indicator of DPCA or placebo.
[NOISE] And then these indicators here are for
the second through fourth age quartiles.
The reference group is the first age quartiles.
And then what I have here are interaction terms between the drug indicator and
each of the indicators for the second through fourth age quartiles.
So literally we just multiplied those things through.
So let me show you just how this shakes down.
If we're looking at age quartile one.
Well, all the indicators are 0 because age quartile 1 is a reference group, and
all the interaction terms are 0 because they're a product
of each of these indicators.
And so the relationship we get on the log hazard scale,
between mortality and treatment is this slope of negative 0.07.
So this is our log hazard ratio for the relationship between
mortality and DPCA compared to placebo amongst persons in age quartile 1.
If we look at age quartile 2 we pick up the same piece that we have for
age quartile 1.
We also picked up another piece of information because they're in
age quartile 2, and we pick up
24:51
the interaction term between the indicator being in age quartile 2 and
the drug variable and the piece for that is the coefficient of 0.28.
Similarly if we look at age quartile 3 we pick up this first piece, negative 0.07
plus a piece that has to do with the age differential and then a piece that has
to do with the interaction between the drug and the age quartile 3 indicator.
Notice what we're getting with each of these interactions is just another
copy of x1, the dug indicator.
And you can see the same sort of thing applies for age quartile 4.
So let's do a little reorganization here to make this a little more cogent.
In age quartile 1 the only number that has to do
with our drug indicator is that initial slope of negative 0.07.
So this is the log hazard ratio mortality for
those in the DPCA group to the placebo amongst the lowest age quartile.
If we wanted to get the log hazard ratio comparing patients on the drug to placebo
for age quartile 2, we would take that initial slope for the first quartile and
then add the coefficient for the interaction term, that 0.28.
So the log hazard ratio here is, when all the dust settles, 0.21.
And this 0.28 estimates the difference in the association between mortality and
treatment for those in age quartile 2 compared to age quartile 1.
Similarly for age quartile 3 we start with the estimated association of log hazard
ratio for those in age quartile 1 and then add the coefficient for the interaction
term between the drug indicator and the indicator for age quartile 3.
And the log hazard ratio for this group would be the sum of those two things,
0.03, and this 0.01 is the estimated difference in
the association between mortality on the log scale, and treatment for
those in the third quartile to the first.
And you could do something similar for the fourth quartile.
If I were presenting these results to somebody else who wasn't as
knowledgeable as we are about regression models I would
use the computer to estimate the hazard ratios.
Exponentiate those log hazard ratios and
then also with the computer I can the confidence intervals.
So this just shows me that there's a slight benefit for the drug for
these in age quartile 1 but it's not statistically significant.
And age quartile 2 and 3 the drug is positively associated with mortality
in this study, but again it's not statistically significant for either.
And then it looks like the results are promising for the oldest group.
In older persons there's an estimated reduction in
mortality that's notable on the order of 27%.
But again, unfortunately this result is not statistically significant in either.
But what this interaction term has allowed us to do is estimate separate,
ultimately separate hazard ratios for between mortality and
treatment for these four age quartiles, and
then with the computer's help put confidence intervals on these.
29:19
So, hopefully this is a, a basic introduction to
the idea of assessing effect modification of, with an interaction term.
I want you to get the basic idea.
I'm not going to hold you responsible for parcee models with interaction terms.
That requires a little more practice than what we can devote to in this course, but
it's really just involved accounting skills and
keeping track of what's turned on when etc.
And then combining terms where appropriate.
So, if the mechanics were a little daunting, don't worry about it.
But I do want you to appreciate that the inclusion of interaction terms allows us,
within the context of one model, to estimate separate outcome predictor
associations for the level of of potential effect modifier, for
different levels of a potential effect modifier in a single regression model.
For those of you who will go on to take further courses in statistics and/or
are interested in applying this in your own research, then this gives you
a primer on how to handle interaction terms in the regression modeling process.
In the next section we'll look at the use of interaction terms in,
in some of the published literature and how the authors report the results and
discuss their approach.