In multiple regression, we use not one but several quantitative predictors to predict a quantitative response variable. In this video, you'll learn why multiple regression is useful and how we express a multiple regression model at the sample and population level, and how to interpret the regression coefficients and intercept. Earlier, I tried to predict the popularity of cat videos measured by number of page views using cat age. If I were to collect actual data, I would probably find that cat age doesn't predict popularity very well. This is no wonder, the popularity of cat videos is undoubtedly influenced by a lot of other things besides the cat's age. Possible other relative characteristics are the cat's fluffiness or hairiness in terms of hair length, its attractiveness, how funny its behavior is, or to what extent it mimics human emotions. With multiple regression, I can add such variables as additional predictors, which will hopefully result in a more appropriate better-fitting model and better predictions. I can also add variables to control for their possibly confounding influence. For example, the time a video has been available online will influence its popularity. It has nothing to do with the attractiveness of the video, but it might explain why some videos of very cute and funny kittens aren't as popular as expected, and some videos of older cats are more popular than expected. Okay, so what does the model look like? Well, it's an extension of the simple linear model at the sample level, we express it as y hat sub i equals a + b sub 1 times x sub i sub 1 + b sub 2 times x sub i sub 2 and so on, until we reach the last predictor, the noted m. We end with b sub m times x sub i sub m. Note that the sub i's indicated y, and the x's stand for individual values. At the population level, we express the model as mu sub y = alpha + beta sub 1 * x sub 1 + beta sub 2 * x sub 2 and and so on, ending with beta sub m * x sub m. To understand how to interpret this model, let's consider a simple example with only two predictors. Suppose we add hairiness as a predictor to model video popularity. Hairiness is rated on a scale between 0 and 10, with 0 meaning hairless and a 10 meaning long-haired like a Persian cat. Say we find this regression equation, y hat sub i equals 34.372 minus 1.775 times age sub i plus 1.414 times hairiness sub i We can visualize this model by considering the relation between cat age and video popularity at particular values of hairiness. Say we take hairless cats with a hairiness score of zero. Given this hairiness score, what is the relation between age and popularity? Well, if we fill in zero in the equation, we simple get Y hat sub i = 34.372- 1.775 times age sub i. This can be drawn as a simple regression line. Now consider the relation for a hairiness score of one. If we enter a hairiness score of one in the equation, we get y hat sub i = 34.372. - 1.775 times age sub i + 1.414, which equals 35.786- 1.775 times age sub i. If we enter a hairiness score of 2, we get y hat sub i equals 34.372- 1.775 times age sub i Plus 2.828 which equals 37.200 minus 1.775 times age sub i. The regression lines predicting popularity with cat age, at given values of hairiness all over on parallel. From this, we can see that a multiple regression for a particular predictor, the regression coefficient gives you the change in the response variable per unit increase of that predictor given the values of the other predictors. It's important to note that the size of each regression coefficient depends on the scale of the predictor. So we can't say that the predictor age, which is larger, is more influential in predicting popularity than the hairiness. Age ranges from 0 to about 50 while hairiness ranges between 0 and 10. Another thing to note is that the value of the regression coefficient for age in our multiple regression equation is different from the value in the simple regression equation, even though the observations are the same. In the simple case, with just a. Age as predictor, we consider the relation between cat age and popularity while ignoring all other variables. By adding hairiness as a predictor, we control for the effective hairiness when we consider the relation between cat age and popularity. We consider the relation for each level of hairiness which might result in a stronger or weaker relation between cat age and popularity. We can visualize the entire model by adding another axis, the z axis to represent hairiness. You can see that the parallel lines now form a plane in a three dimensional graph. This plane represents the predicted values produced by the model. The intercept A is where the plane crosses the y-axis, so the intercept A represents the predicted value when cat age and hairiness are both 0. Just like in simple linear regression, I can calculate the residuals, the vertical distances between the observations and the predicted values, which in this case lie on the regression plane. These residuals are used to find the intercept and regression coefficient That provide the best fitting plane through the data points. Just like in simple regression the residuals are minimized using the method of ordinary least squares. Because the results in formulas for the intercept in regression coefficients are more complicated we'll use statistical software to calculate them. At the population level we model the means of the conditional distributions, mu sub y. For every point on the plain, for every combination of cat age and hairiness we assume there is a distribution of popularity scores. The mean of this distribution lies on the plain. The standard deviations of all these conditional distributions are assumed to be identical, so the spread of observations around the plane is assumed to be the same everywhere. With more than two predictors, it's no longer possible to represent the model visually in one graph. But the logic and the interpretation of the intercept and regression coefficients is the same.