We saw that there were very little gains from adding the second
explanatory variable, because our r squared went up by just a tiny bit.
And in fact, our adjusted r squared did not go up at all.
So why might that be the case?
Let's take a look to see how the variables
white and female householder are related to each other.
We can see that there isn't much scatter in this
scatter plot displaying the relationship between white and female householder.
And, the correlation coefficient between these variables is quite
high, indicating a strong negative relationship between the two variables.
What that means is that the variable white is highly correlated with
the variable female householder, and therefore
they're not independent of each other.
If that is the case, we wouldn't want to add the variable white to our existing
model that already has female householder, because it's
going to bring nothing new to the table.
Any information that could be gleaned from
the variable white, is probably already being captured
by the variable female householder because these
two variables are highly associated with each other.
In addition, using both of these variables in the model is going to result in
multicollinearity which we said might also result in
unreliable estimates of the coefficients from the model.