[SOUND] Welcome. In this lecture you will learn the most famous formula of econometrics, b is X prime X inverse times X prime y. We first recall some notation of the previous lecture. The data consists of n observations of the dependent variable y, and on each of k explanatory factors, in the n times k matrix X. The marginal effect of each explanatory factor is assumed to be constant, which is expressed by a linear relation from X to y. These marginal effects are unknown, and our challenge is to estimate beta from the data y and X. More precisely, we search for a k times 1 vector b, such that the explained part, X times b, is close to y. As before, we assume that the matrix X has full column rank. I invite you to answer the following test question. This result follows immediately from the property that the rank of a matrix is smaller than or equal to the number of rows. Our challenge is to find the vector b, so that the residuals are small, where the residual vector e is defined as the vector of differences between the actual values of y and the fitted values, X times b. As criterion to judge whether the residuals are small, we take the sum of squares of the components of this vector. We choose the vector b, so that this sum of squares is as small as possible. And this method is therefore called least squares. To distinguish this method from more advanced methods like weighted or non-linear least squares, it is usually called ordinary least squares, or simply OLS. The sum of squared residuals can be written with vector notation as the product of the transpose of the vector e, with the vector e. We use matrix methods to analyze the OLS criterion as is shown on the slide. And now, I invite you to answer the following test question on matrix methods. This result follows from the rules of matrix transposition, where we use the obvious fact that a scalar, that is, a matrix with a single row and column, is equal to its transpose. If you wish, you can consult the Building Blocks on matrix methods. Now, we should minimize the above expression for the sum of squares by choosing the k times 1 vector b. This minimum is found by solving the first order conditions. That is, by finding the value of b for which the derivative of S, with respect to b, is 0. As b is a vector, we need results on matrix derivatives. If you wish, you can consult the Building Blocks for these results. We apply these results on matrix differentiation to get the first order conditions shown on the slide. And now, I invite you to answer the following test question. The answer follows from the assumption that the matrix X has full column rank, as shown on the slide. The first order conditions, therefore, have a unique solution for b, which gives the famous OLS formula, that b is equal to X prime X inverse times X prime y. Note that this formula can be computed from the observed data X and y. We obtained the OLS formula by means of matrix calculus. It's sometimes helpful to have also a geometric picture in mind. The data consists of n observations of y, and of each of the k explanatory factors, so that y and each column of X are vectors in n-dimensional space. We define two matrices, H and M, as shown on the slide. And now I invite you to answer the following test question. The answer is obtained by direct matrix calculations. You can consult the Building Blocks if you experience any problems in solving this question. The matrix H transforms the vector of observations y into a vector of fitted values X times b. And the matrix M transforms the vector of observations y into the vector of residuals e. The results of the above test show that the residuals are orthogonal to the fitted values. And this result is also intuitively evident, from a geometric point of view. You can choose b freely to get any linear combination, X times b, of the columns of X. So, you're free to choose any point in the plane spanned by the columns of X. The optimal point in this plane is the one that minimizes the distance to y, which is obtained by the orthogonal projection of y onto this plane. The resulting error e is therefore orthogonal to this plane. In the picture, the matrix H is the orthogonal projection onto the X plane. And the matrix M is the orthogonal projection on the space that is orthogonal to the X plane. The figure shows the geometric interpretation of ordinary least squares. You now know how to estimate the parameters beta by OLS. The OLS estimates b are such that X times b is close to y, and the residuals e are caused by the unobserved errors, epsilon. We measure the magnitude of these error terms by their variance. As epsilon is not observed, we use the residuals instead. We can estimate the error variance by means of the sample variance of the residuals. But this can be done even better. Now I invite you to answer the following test question. These results follow from the fact that e is orthogonal to the X plane, so that X prime times e is 0, as is also proven on the slide. We therefore divide the sum of squared residuals not by n minus 1, but by the degrees of freedom, n minus k. In the next lecture, we will see that this provides an unbiased estimator of the error variance under standard regression assumptions. The model provides a good fit when the actual data y are approximated well by the fitted data, X times b, that is, by the predicted values of y obtained from the X factors. A popular measure for this fit is the so-called R squared, defined as the square of the correlation coefficient between the actual and the fitted data. A high value of R squared means that the model provides a good fit. Our standard assumption is that the model contains a constant term. In this case, the R squared can be computed in a simple way, from the sum of squared residuals. Now, I invite you to make the training exercise, to train yourself with the topics of this lecture. You can find this exercise on the website. And this concludes our third lecture.