这门课程介绍一元和多元线性回归模型。 这些模型能够让你获得数据集和一个连续变量之间的关系。（比如说：）在教授的外表吸引程度和学生的评分之间有什么关联么？我们可以根据孩子母亲的特定特征来预测这个孩子的测试分数么？在这门课程当中，你将会学习线性回归的基本理论，运用免费统计软件R、RStudio分析一些数据例子来学习如何拟合、检验，以及如何利用回归模型去检验多元变量之间的关系。

Loading...

来自 Duke University 的课程

线性回归和建模

730 个评分

这门课程介绍一元和多元线性回归模型。 这些模型能够让你获得数据集和一个连续变量之间的关系。（比如说：）在教授的外表吸引程度和学生的评分之间有什么关联么？我们可以根据孩子母亲的特定特征来预测这个孩子的测试分数么？在这门课程当中，你将会学习线性回归的基本理论，运用免费统计软件R、RStudio分析一些数据例子来学习如何拟合、检验，以及如何利用回归模型去检验多元变量之间的关系。

从本节课中

Linear Regression

In this week we’ll introduce linear regression. Many of you may be familiar with regression from reading the news, where graphs with straight lines are overlaid on scatterplots. Linear models can be used for prediction or to evaluate whether there is a linear relationship between two numerical variables.

- Mine Çetinkaya-RundelAssociate Professor of the Practice

Department of Statistical Science

Once you check your conditions and you're convinced that a

linear model is indeed appropriate for you data and is appropriate

to model the relationship between your response and your explanatory variables,

the next step is to check the fit of your model.

In other words, how well it fits your data.

And for that, we introduce a new measure called R squared.

Strength of the fit of a linear model is most commonly evaluation using R squared.

This is calculated as simply the square of the correlation coefficient.

The R squared tells us what percent of variability

in the response variable is explained by the model.

The remainder of the variability is explained

by variables not included in the model.

And in the R squared value being the square of

the correlation coefficient is going to be a number that's always

between zero and one, that corresponds to the percentage of

the variability in the response variable that's explained by the model.

If you know your correlation coefficient, calculating R squared is easy.

But how do we interpret it?

Which of the following is the correct interpretation of the r squared for this

model for predicting percentage living in poverty

from percentage of high school graduation rate?

And remember that R squared was 0.5625.

(A) 56.25% of the time, percentage of

high school graduates predict percentage living in poverty correctly.

This basically is implying, that 56.25% of the data

points are going to be exactly on the regression line.

So predicted accurately, versus the remainder not.

But note that R squared is not about what

percent of the data actually fall on the regression line.

So this is not a correct interpretation.

(B) 43.75% of the variability and the percentage of residents

living in poverty among the states is explained by the model.

This is starting to sound right, but the value is simply not right.

Here this is the compliment of R squared, versus what we would really want is

for R squared to be explaining the

percentage of the variability in the response variable.

So this is not the correct interpretation either.

(C) 56.25% of the variability in the percentage of the

high school graduates among the states explained by the model.

This is not correct because this is

about the explanatory variable, not the response variable.

(D) 56.25% of the variability in the % of residents

living in poverty among the states, is explained by the model.

This is indeed the correct interpretation of R squared.

The percentage of variability in the response variable explained by the model.

Let's take a look at another example.

The R squared for the relationship displayed in this scatter plot is 92.16%.

What is the correlation coefficient?

Since going from R to R square, we simply square the value.

Going from R squared to R, all we need to do is take the square root.

So the square root of 0.9216 actually yields 0.96,

but could this be the correlation coefficient for this relationship?

The answer is no.

Because there is clearly a negative relationship between these two variables.

Therefore the correct correlation coefficient would be negative 0.96.

Remember, if you square negaticve 0.96, you're still going to get positive 0.9216.

So when you're going from R squared to

R, you want to first rely on your calculator or

computation to give you the numerical answer, but then

you want to look at the relationship between the variables.

A scatter plot of the relationship, in order to determine whether

the sign of that correlation coefficient should be positive or negative.