这门课程介绍一元和多元线性回归模型。 这些模型能够让你获得数据集和一个连续变量之间的关系。（比如说：）在教授的外表吸引程度和学生的评分之间有什么关联么？我们可以根据孩子母亲的特定特征来预测这个孩子的测试分数么？在这门课程当中，你将会学习线性回归的基本理论，运用免费统计软件R、RStudio分析一些数据例子来学习如何拟合、检验，以及如何利用回归模型去检验多元变量之间的关系。

Loading...

来自 Duke University 的课程

线性回归和建模

758 个评分

这门课程介绍一元和多元线性回归模型。 这些模型能够让你获得数据集和一个连续变量之间的关系。（比如说：）在教授的外表吸引程度和学生的评分之间有什么关联么？我们可以根据孩子母亲的特定特征来预测这个孩子的测试分数么？在这门课程当中，你将会学习线性回归的基本理论，运用免费统计软件R、RStudio分析一些数据例子来学习如何拟合、检验，以及如何利用回归模型去检验多元变量之间的关系。

从本节课中

Multiple Regression

In this week, we’ll explore multiple regression, which allows us to model numerical response variables using multiple predictors (numerical and categorical). We will also cover inference for multiple linear regression, model selection, and model diagnostics. Hope you enjoy!

- Mine Çetinkaya-RundelAssociate Professor of the Practice

Department of Statistical Science

In this video, we define two terms that

are essential to multiple regression, collinearity and parsimony.

Two predictor variables are said to be

collinear when they are correlated with each other.

Remember: predictors are also called independent variables,

so they should be independent of each other.

In other words, they should not be collinear with each other.

Inclusion of collinear predictors, also

called multicollinearity complicates model estimation.

And what we mean by complicates model estimation is that the

estimates coming out of the model may no longer be reliable.

Let's take a look back at our para weiss scatter plots

and remind ourselves that earlier we fit a model predicting poverty from

female householder and then we added the variable white or the

percentage of white residents living in that state to the existing model.

We saw that there were very little gains from adding the second

explanatory variable, because our r squared went up by just a tiny bit.

And in fact, our adjusted r squared did not go up at all.

So why might that be the case?

Let's take a look to see how the variables

white and female householder are related to each other.

We can see that there isn't much scatter in this

scatter plot displaying the relationship between white and female householder.

And, the correlation coefficient between these variables is quite

high, indicating a strong negative relationship between the two variables.

What that means is that the variable white is highly correlated with

the variable female householder, and therefore

they're not independent of each other.

If that is the case, we wouldn't want to add the variable white to our existing

model that already has female householder, because it's

going to bring nothing new to the table.

Any information that could be gleaned from

the variable white, is probably already being captured

by the variable female householder because these

two variables are highly associated with each other.

In addition, using both of these variables in the model is going to result in

multicollinearity which we said might also result in

unreliable estimates of the coefficients from the model.

And that discussion brings us to a new term, parsimony.

We want to avoid adding predictors that are associated with each other, because

often times the addition of such variable brings nothing new to the table.

We prefer the simplest best mode, in other words, the parsimonious model.

This is the model that has the highest

predictive power, however, has the lowest number of variables.

This idea comes from Occam's razor, which states that among

competing hypotheses, the one with the fewest assumptions should be selected.

In other words, among models that are equally good, we

want to select the one that has the fewer variables.

We've also heard that addition of collinear variables

can result in biased estimates of the regression parameters.

So not only do we prefer simple parsimonious models, but we also want

to be very careful about adding a bunch of explanatory variables to a model.

Because if those are co-linear with each other,

the model estimates may no longer be reliable.

Lastly, while it is impossible to avoid co-linearity from arising in observational

data, experiments are usually designed to control for correlated predictors.