Now, after we looked at how things are done in a classical finance,

we want to pursue a more data-driven approach.

We can motivate it by asking,

who has ever said that we know what factors drive returns?

How about keeping the same framework but

saying instead that our factors are unobservable.

That is, in this approach, we want to pursue.

Here is this, let's keep the same factor framework but instead let

the data itself to decide what factors the model wants to use.

And such will put us squarely in realms of unsupervised learning as we will see now.

So, how do we formulate the problem in terms of unsupervised learning?

To this end, the first thing we do is to make data matrix, X.

For this, we take daily log returns,

then subtract their mean and divide by their variance.

Then we take a dot product of X transposed NX and divide by the number of observation,

N. This gives us the covariance matrix,

C of standardized returns,

which is the same as the correlation matrix of returns.

Now, the first thing we can notice about the correlation matrix,

C is that it's not diagonal.

This diagram shows you a histogram of

pairwise correlations for stocks in the Dow Jones index.

As you can see,

the mean value of this distribution is around 33 percent,

which is significantly different from zero.

Now, how can we make the correlation matrix diagonal?

Let's introduce a linear transformation of our data,

X that we will call Z.

Z is simply a dot product of X and some orthogonal matrix, V of dimension,

p by K. As the matrix, V is orthogonal,

it means that V transpose times V equals a unit matrix.

Note that we assume here that K is less or equal to p. The result in matrix,

Z will have dimension,

N by K then.

We will call Z the linear encoder of the input,

X because it's given by a linear transform of X.

Now, assume that you already found Z from our inputs, X.

If we now multiply Z by a transposed matrix, V,

we again obtain the matrix of dimension,

N by P, the same as our original data matrix, X.

We can call this new matrix X-hat.

X-hat can be viewed as a reconstructed or decoded signal.

Now, let's compare this view of the signal encoding and

decoding with a classical PCA or the principal component analysis.

The PCA starts with the eigenvalue decomposition of the correlation matrix,

C. It states that C can be represented as a product U transposed times lambda times U.

Here, U is an orthogonal matrix that stores eigenvalues of C

column-wise so that U transpose times U equals a unit matrix.

And lambda is diagonal matrix of eigenvalues of C. Now,

let's use the eigenvalue decomposition to

compute the covariance matrix of the encoded signal,

Z that we just introduced.

We get this expression that again looks like an eigenvalue decomposition.

That is, if we call the product V transpose times U a new matrix, B, for example,

then the result in covariance matrix of Z will be B transposed times lambda times B.

Now, what we get as a result depends on what we take for matrix,

V. Remember that matrix,

U is a matrix of all eigenvalues stored column-wise.

On the other hand, matrix,

V is a matrix of K eigenvalues stored column-wise,

where K is less or equal than P,

which is the total number of components in the portfolio,

which is the same as the total number of

eigenvectors if N is larger than P. If we now take K equal P,

then it means that V equals U and therefore,

because U is an orthogonal matrix,

the covariance of Z with this choice will be given by Lambda.

But Lambda is a diagonal matrix,

and this means that the new vector,

Z is made of uncorrelated components with a diagonal correlation matrix.

This means that the choice,

K equal P preserves the total variation of data,

which is defined as a trace of the correlation matrix,

C. The trace of a matrix is the sum of all its diagonal elements.

Because the trace is in variant under a cyclical replacement of its arguments,

the total variation of X is equal to

the trace of Lambda as shown in the first formula here.

But the total variation of Z defined in the same way is also equal to

the trace of Lambda when K equals P. This means that when K equals P,

we preserve all variants in the data by making a linear encoding,

Z equals X times V. Now, columns of matrix,

Z are given by the products of normalized returns,

X and components of eigenvectors, U.

Therefore, we can interpret components of any given eigenvector as weights of

different stocks in a certain stock portfolio

of the portfolio of all stocks in the index.

By construction, we will have P that is the number of stocks,

which is the same as the number of eigenvector,

C and is larger than P of such stock portfolios.

They are usually referred to as eigen-portfolios,

we're discussing applications of the PCA to asset management.

In this graph, I show you the first eight eigenvectors that are the weights of

different stocks in eight eigen-portfolios

that are found for stocks in the Dow Jones Industrial index.

As you can see, the first portfolio that represents the first principal component

has all weights of the same sign and about the same magnitude.

This first principal component is in fact,

the market as recovered by the PCA instead of

being pre-imposed to be the actual value of the Dow Jones index.

It's also interesting to look at other eigenvectors of the correlation matrix.

The second eigenvector is orthogonal to the first one,

which means that the second eigen-portfolio is uncorrelated with the market.

And the same holds for all other eigen-portfolios,

each one of them is uncorrelated with

the market and uncorrelated to the rest of eigen-portfolios.

Therefore, we can reduce the problem of optimal investment among universe of

P correlated stocks to a simpler problem of

an optimal investment in a set of P uncorrelated eigen-portfolios.

Often, this trick equivalent

to a sort of rotation of a vector basis in terms of linear algebra,

comes very handy in different sorts of analysis of portfolio risks.

So, what we saw so far was an example of using the PCA as

a coordinate transformation method that lets you to

convert correlated features into uncorrelated ones.

Please note that uncorrelated does not mean independent.

Sometimes random variables can be uncorrelated but

dependent via higher moments of their joined distribution.

Now, let's pause here for a moment to see what we learned and then move forward.