Let X be a column random vector consisting of n random variables, X_1, X_2 up to X_n. We adopt the convention that a bold face letter represent an n dimensional column vector. The covariance matrix of the random vector X, denoted by K_X is defined as the expectation of X minus expectation of X, which is a column n vector, times the transpose of itself, which is a row n vector, and so this is actually an n by n matrix. We are going to show that the (i,j)-th element of this n by n matrix is equal to the covariance of X_i and X_j. Note that the i-th diagonal element, namely covariance of X_i and X_i, is equal to the variance of X_i. The proof is elementary. First consider the product, of X minus expectation of X and the transpose of itself. Then the (i,j)-th element of this n by n matrix is equal to X_i minus expectation of X_i, times X_j minus expectation of X_j. Then we take the expectation inside the matrix to see that the (i,j)-th element, is actually equal to the covariance of X_i and X_j. In a similar fashion, we define the correlation matrix K tilde of the random vector X as the expectation of X and X transpose. This is an n by n matrix with the (i,j)-th element equal to the expectation of X_i times X_j. The covariance matrix and the correlation matrix can be related in two ways. First, K_X is equal to K tilde X, minus expectation of X times the transpose of expectation of X. Second, the covariance matrix K_X is equal to the correlation matrix K tilde, of the random vector X minus expectation of X. These two relations are actually vector generalizations of the following two relations. Specifically, this is a generalization of this, and this is a generalization of this. We now define the Gaussian distribution, which will be used repeatedly in this chapter and the next chapter. The Gaussian distribution of a real random variable with mean mu and variance sigma square, is denoted by N of mu comma sigma square, with the probability density function f(x) equals 1 over square root, two pi sigma square e to the power minus, x minus mu square, divided by two sigma square, for all values of x. The multivariate Gaussian distribution, of an n dimensional random vector with mean mu, where mu is an n dimensional vector, and covariance matrix K, is denoted by N of mu, K. The joint pdf of the distribution is given by f(x) equals 1 over square two pi to the power n, times the square root of the determinant of the covariance matrix K times e to the power minus one half, x times mu transpose, times K inverse, times x, minus mu, for all (n) dimensional vectors x, where the covariance matrix K is a symmetric positive definite matrix. Note that x minus mu transpose is a one by n vector, K inverse is an n by n matrix, and x minus mu is an n by 1 vector. And so, the product of these three matrices is a scalar. We have already set up the notations for random vectors. In section 10.1, we will discuss some basic operations on a random vector. A square matrix K is symmetric if K transpose is equal to K. That is by taking the transpose of the matrix, the matrix remains unchanged. An n by n matrix K, is positive definite, if x transpose times K times x is strictly positive for all non zero column n-vector x. and is positive semi-definite if x transpose times K times x is greater than or equal to zero, for all column n-vector x. Proposition 10.3 says that a covariance matrix is both symmetric and positive semi-definite. We will leave the proof as an exercise. We now discuss a very important property of symmetric matrices called diagonalization. A symmetric matrix K, can be diagonalized as K equals Q lambda Q transpose, where lambda is a diagonal matrix and Q, also Q transpose, is an orthogonal matrix. That is, Q inverse is equal to Q transpose. The proof of this fact is elementary. Let q_i be the i-th column of the matrix Q. Because q_i form a set of orthonormal basis, q_i times q_j transpose, that is the dot product of q_i and q_j, is equal to 1, if i is equal to j, and is equal to 0, if i is not equal to j. Now consider the product Q times Q transpose, where Q consists of the columns q_1, q_2 up to q_n and Q transpose consists of the rows q_1 transpose, q_2 transpose, up to q_n transpose. By regarding Q as a row vector and Q transpose as a column vector, we see that the (i,j)-th element of Q, Q transpose is equal to q_i times q_j transpose. Because the matrix Q is orthogonal, we see immediately that Q, Q transpose is equal to the identity matrix. And from this, we see that Q inverse is equal to Q transpose. Another property is that the determinant of Q and also the determinant of Q transpose is either equal to 1 or minus 1. Consider the square of the determinant of Q. Because the determinant of Q is equal to the determinant to the transpose of Q, this can be written as the determinant of Q times the determinant of Q transpose. This can be further written as the determinant of Q times Q transpose, which is equal to the identity matrix because Q transpose is equal to Q inverse, as we have shown. Now, the determinant of the identity matrix is equal to 1, and so we have the square of the determinant of Q is equal to 1. And this shows that the determinant of Q and the determinant of Q transpose are equal, and they are equal to either plus 1 or minus 1. Let lambda_i be the i-th diagonal element of the matrix lambda, and q_i be the i-th column of the matrix Q. Then KQ is equal to Q times lambda times Q transpose times Q. By the associativity of matrix multiplication, we have Q lambda bracket Q transpose times Q where Q transpose times Q is equal to the identity matrix and so, we have shown that KQ is equal to Q lambda, or K times q_i is equal to lambda_i times q_i, which we now show. Consider KQ, where Q consists of the column q_1 up to q_n. Upon multiplying K into the matrix, we obtain the columns, Kq_1, Kq_2 up to Kq_n. On the other hand, consider Q times lambda where Q consists of the columns q_1, q_2, up to q_n, and lambda consists of the diagonal elements lambda_1, lambda_2, up to lambda_n, we then obtain the columns lambda_1 q_1, lambda_2 q_2, all the way to lambda_n q_n. Upon equating the columns of KQ and Q lambda, we see that Kq_1 is equal to lambda_1 q_1, so on and so forth. Or, Kq_i is equal to lambda_i times q_i for all i. [BLANK_AUDIO] That is q_i is an eigenvector of K with eigenvalue lambda_i. Proposition 10.4 says that the eigenvalues of a positive semi-definite matrix are non-negative. Consider an eigenvector q which is non-zero and the corresponding eigenvalue lambda, of a positive semi-definite matrix K. That is, Kq is equal to lambda q. Since K is positive semi definite, q transpose times K times q, is greater than or equal to 0. Now, Kq is equal to lambda q from the above, and this is equal to lambda times q transpose times q. [BLANK_AUDIO] Now q transpose times q, is the magnitude square of the vector q, and so it is strictly positive. And therefore, we see that lambda is greater than or equal to zero. This proves the proposition. As a remark, since a covariance matrix is both symmetric and positive semi-definite, it is diagonalizable, and its eigenvalues are non-negative. [BLANK_AUDIO] Proposition 10.5, is about taking linear transformation of a random vector. Let Y equals A times X, where A is an n by n constant matrix. Then K_Y, that is the covariance matrix of Y, is equal to A times K_X times A transpose. And similarly, K tilde sub Y, the correlation matrix of Y is equal to A times K tilde sub X, times A transpose. This proposition is proved as follows. Consider K_Y equals expectation of Y times Y transpose, minus expectation of Y times the transpose of expectation of Y. To obtain the next line, we only have to note that Y transpose is equal to X transpose times A. Now A X times X transpose A, is equal to A times X times X transpose times A transpose. For the second term, by the linearity of the expectation operator, we can move it inside the matrix A. And similarly, we can move this expectation inside the matrix A transpose. Now for the first term, again, by the linearity of expectation, we can move it inside the matrix A and A transpose. Then by factorizing the matrix A, and factoring the matrix A transpose, we obtain A times the square bracket times A transpose. Now, inside this square bracket, we have expectation of X times X transpose, minus the expectation of X times the transpose of the expectation of X, which is K_X, the covariance matrix of the vector X. This proves the first part of the proposition. The proof for the second part of the proposition is similar, and it is omitted. The next proposition is about decorrelation of a random vector. Let Y be Q transpose X, where the matrix Q is obtained from the diagonalization of the matrix K_X. That is K_X is equal to Q times lambda, times Q transpose. Then K_Y is equal to lambda, that is the random variables in Y are uncorrelated. And the variance of Y_i is equal to lambda_i for all i. We now prove the lemma. By proposition 10.5, K_Y is equal to Q transpose times K_X times Q because Y is equal to Q transpose times X. Now, by the diagonalization of K_X, it can be written as Q times lambda times Q transpose. And so, we have Q transpose times Q, that is the identity matrix, times lambda times Q transpose times Q which again, is the identity matrix. So we obtain K_Y is equal to lambda. Since K_Y is equal to lambda, which is a diagonal matrix, the random variables in Y are uncorrelated, because covariance of Y_i and Y_j is equal to 0 for all i not equal to j. Furthermore, the variance of Y_i is given by the i-th diagonal elements of K_Y equals lambda, that is lambda_i, and so the proposition is proved. [BLANK_AUDIO] The next corollary, says that any random vector X can be written as a linear transformation of an uncorrelated vector. Specifically, X can be written as Q times Y, where K_X is equal to Q times lambda times Q transpose. The proof is straightforward. In proposition 10.6, Y equals Q transpose X, implies QY is equal to Q times Q transpose times X, where Q times Q transpose is equal to the identity matrix. And so, we have X equals Q times Y. Proposition 10.8 says the following. Let X and Z be independent, and Y is equal to X plus Z. Then the covariance matrix of Y, is equal to the covariance matrix of X plus the covariance matrix of Z. [BLANK_AUDIO] Here are some remarks. If Y is equal to summation X_i, i from 1 up to n, where X_1 X_2 up to X_n are mutually independent, then the covariance matrix of Y is equal to the sum of the covariance matrices of X_i, where i is from 1 up to n. When X_i are scalers, this reduces to variance of Y is equal to summation variance X_i, i from one up to n. Proposition 10.9 is about preservation of energy for an orthogonal transformation. Let Y equals Q times X, where Q is an orthogonal matrix. Then the expectation of summation Y_i square, is equal to the expectation of summation X_i square. We now prove the proposition. Consider summation Y_i square, equals Y transpose times Y. Now Y is equal to Q times X. And so, Y transpose is equal to X transpose times Q transpose. Then, we have X transpose, times Q transpose, times Q times X, where Q transpose times Q is equal to the identity matrix. And so we have X transpose times X, which is equal to summation X_i square. Then the proposition is proved upon taking expectation on both sides.