So we talked about latent variable models, including factor analysis, probabilistic PCA and Gaussian mixtures. We said that these models are similar in the sense that they all include hidden state, and for all of them we can use the EM algorithm with the ML estimation to both make inference of the hidden state and estimate model parameters. Now, I want to talk about one more thing that all these models have in common, which is the fact that they all make a very special assumption about the true data-generating distribution. In fact, we can say even more that all models that we discussed so far rely on the same assumption that we are going to analyze now. So to take a specific example, let's come back to factor analysis and take another look at its basic equation. It says that the observed vector y equals the sum of the product of lambda times x plus an uncorrelated white noise epsilon. Here, x is a vector of K Gaussian random variables with zero means and unit variance. Know that there are only K of them and yet they are tuned to the represent endpoints in the data y by tuning the values of the factor loading matrix lambda and noise variances in vector epsilon. Now, the most critical assumption about the data-generating mechanism assumed here is that all pairs, x and y, are iid, which means independent identically distributed samples from some true data-generating distribution Pdata. So the part with x is easy, as x is unobserved anyway, and we can just model samples of x as iid. But what about y, which is the observed signal? Well, it turns out that most of data in finance is sequential. By sequential data, I mean any data where the order of observations is important. If sequential data refers to data observed at regular time intervals, for example, minutes, days, months, years and so on, such data is called time series data. There are probably much more of examples of sequential data in finance than non-sequential data. For example, asset prices, such as stocks, bonds, commodities, effects and so on, are all sequential data and more specifically time series data. Macro-economic data or balance sheet data are also sequential data. News is another example of sequential data. Now, sequential data cannot be iid. To see that, let's say we have two events, E_1 and E_2. If event E_1 happened at time T_1 and then event E_2 happen at the later time T_2 but the data is iid, then this is totally equivalent to event E_2 happening at earlier time E_1 and then event E_1 happening at a later time T_2. Now, let's say that we talk about a news data and event E_1 is reported banking fraud, which is manipulating the labor rate and E_2 is a news about resulting fines that banks paid to regulators. Now, the first sequence that is E_1 and E_2 is normal and corresponds to a causal relation between these two events. But the second sequence would be first fines then fraud. An iid universe where two such sequences would have equal probabilities would be a very interesting one. But fortunately or hopefully, we do not live in such universe. So, the conclusion is that for sequential data, the order of data is important and sometimes is even critically important. But you could notice that in our previous videos, we treated such data as stock returns as iid data. For example, we assumed that when we did linear regression with this data. But on the other hand, we just said that stock prices are not iid data, so what is the catch? What statement is right? Well, the last statement is right absolutely, but the first statement is only correct approximately. Returns are obtained by taking different sets of low prices. By taking differences between two consecutive stock prices, we only eliminate the most dominant leading order of codependence between these observations. Higher order dependencies persist even after taking these differences via, for example, all the correlations in stock returns. So, in general, taking differences for sequential data and reaching these difference as iid observation sees what is called poor man's solution to the problem. Even more, for some types of sequential data, differences may even be undefined. For example, for discrete sequences for some finite set of states S_1 to S_K, for example, states describing economic regimes, such differences would not make any sense. So are there any better ways to model sequential data with probabilistic framework? Let's assume we have the time series of some variable y over T timesteps. Vector y can be vector value. For example, it can be a set of prices or for stocks in the S&P 500 index. Now, without any loss of generality, we can write the probability of the total path of vector y as the product of the vector on each given step, conditional on the values at all previous steps. Know that there are no assumptions involved here whatsoever. This is just basic decomposition of a joint probability and nothing more. But the assumption is as general as its next to useless as it stands. The reason is that the length of data can be arbitrary, so we need to somehow balance complexity of the model. In other words, we have to constrain the number of predictors so that the model will not behave while out of sample. Now, the simplest way to do something like that is simply to truncate the lookback window in data by some fixed number of steps. For example, for prediction of a stock return for the next day, we can only look at the last months of its price history. The shorter the lookback window, the less memory we'll let our system have. So it becomes progressively shorter memory when we shrink this interval. In the limiting case, we can consider a model where the probability of values for the next timestep only depends on the current state. Such one was called Markov models named after the Russian mathematician Andrey Markov, who studied their properties in the early 20th century. Systems described by such models are called memoryless. So this is the simplest possible model for a sequence, and it's shown at this diagram. Arrows that are here mean causal relations, so each next step depends only on the previous state and nothing else, so to speak, the least intelligent model in all possible models. Know that it means among other things that two different increments are mutually independent. Now, in physics, such models work extremely well for many problems. In finance though, such models provide a reasonable approximation for many cases. For example, for stock return analysis in its simplest version. But still, there are many cases in which looking only once that back is simply inadequate to the problem. Let's say we want to look two steps back. How can we do that? The answer is that we can and to listen, all we need to do is to extend the basic formulation of Markov chain or Markov model. Such extensions are called K-order Markov models, where K is the length of the lookback period. The conventional Markov models are referred to as first-order Markov models. The K-order Markov models are built the same way as first-order models but the state vector of predictors y now simply includes K previous observations instead of just one previous value. So this approach is good, but it has limits. If you want to keep, say, 30 past values for modeling, you will need 30 sets of model parameters and even after that, you will still be accounted by the question why 30 values and not 40 values. In other words, increasing the order of the Markov model leads to proliferation of model parameters, which can be undesirable and still unsatisfactory in certain cases even when possible. The other potential issue with Markov models is that conditioning on the past values to predict future values may also be problematic in certain cases. This is because y itself is a random variable, and if there is a strong noise in the data, conditioning on the observed values themselves would propagate or even magnify errors in predictions. In the next video, we will consider an alternative approach, the modeling sequence data. But, as usual, let's do it after a short commercial break. Sorry, I meant to say control questions.