Now, let's talk about the Principle Component Analysis,

also known by its short name as the PCA.

As I mentioned earlier,

the PCA is probably

the most commonly used data transformation

and dimension reduction method in all of machine learning.

While it has millions of different applications,

I think that this topic will be best

presented when we take some specific example from the start.

So let's consider the problem of analyzing market data,

and more specifically the data for daily returns for a set of stocks.

For our illustration we will use 30 stocks in the Dow Jones Industrial Index.

Let me set my notation first.

We have a set of stock prices as I of t where I enumerate stocks,

so we thrums between 1 and N stocks.

The analysis is done in terms of log-returns ri defined by these formula.

Now, we want to build some sort of a model

for the log-returns of all stocks in the index.

One possible model is

a simple linear regression of all log-returns on a single predictor.

We will call such a predictor the market factor or simply the market for short.

Typically, in such models,

market indices such as S&P 500 or Dow Jones Industrial are used as market factors.

And because we deal with stocks from Dow Jones industrial index here,

it makes sense to take the Dow Jones index as a market for this problem.

In this case the problem can be formulated as a linear regression with

a single predictor given by the value of the index as shown in this equation.

This relation is known in finance as one factor capital asset pricing model or CAPM.

Now, let's see how such models are used in quantitative trading.

One popular strategy is statistical arbitrage,

that tries to filter out the impact of the market from stock returns.

The rationale of this is that market itself is too random to try to predict it.

Instead we should subtract its impact and focus on the residuals,

that is parts of equity returns that are unexplained by the market returns.

We compute residuals as shown in the second formula here.

The residuals are then used as features to produce Trading Signals.

For example, mean returns strategists look for deviations of

the residuals from their mean levels in order to produce ratings sequels.

The resulting portfolios are uncorrelated with the market by construction,

and if they are large and diversified enough,

they will have low volatility due to their diversification effects.

If you're interested in more details on these topics,

the classical paper by Avellaneda and Lee from 2008 would be a good read.

The paper also discusses extension of

the one factor approach to a more interesting multi-factor setting.

In the multi-factor setting factors F, J,

represents some systematic factors that can

be represented for example by exchange traded funds or ETFs,

if you stick to a classical financial approach.

But because here we are into machine learning methods and finance,

we want to focus on a more data-driven approach.

Let's do it in the next video,

but before moving there let's see what we have learned.

Okay so, what did we learn about the PCA so far?

We learn that the PCA is

a convenient coordinate transform that

turns your correlated features into uncorrelated ones.

But, what about dimensional reduction task that we said the PCA is good for?

In the next video,

we will take a look at the PCA as adimension reduction tool.