So far we have looked at state based models and hidden Markov models. We said that they are very similar to each other in the sense that they both have a hidden state with also the Markov dynamics. The main difference is that for state space models the hidden state is continuous, while for hidden Markov model it's discrete. But what makes them more similar to each other, and different from the rest of model classes in our diagram, is the fact that they both have only a short term memory. That means they only remembers things once they're back. In other words, they're pretty dumb. Also they are usually formulated as parametric models with a fixed and small number of attainable parameters. This may restrict their predictive power when you have lots of data to train the model. So you might want to look for non primitive models, where the number of model parameters may vary depending on the amount of available data. In this video we will talk about the third class of models in our diagram, which are neural network models for sequential data. As we will see, these models are both non-parametric as well as they are much more intelligent than the first two types of models, as they have a long-term memory in addition to a short-term memory. Let's start our discussion with the concept of a recurrent neuron. A recurrent neuron is a dynamic neuron which is a form of a simple dynamic system. At each time t, such neuron has some internal state which we will denote a of t. Now the system is usually described to you in discrete time steps, though continuous time formulations also exist. We will stick to the discrete time formulation. As the system moves to the next time step from the previous step, t minus 1, it receives two sorts of inputs. The first one are regular observed inputs, x of t for this step. For financial applications, these can be stock rituals for this date, macroeconomic factors, news and so on. Any quantitative information that is available at time t, and is relevant to the problem. The second input received by a current neuron at time t is it's own internal state at the previous time t minus 1. So the activation a of t of such neuron at time t will be equal to a sum of these two inputs, taken with weight Wx and Wa, respectively, plus a constant term plus some noise epsilon. This might look similar to a latent variable model for a of t, but it's different from it because now a of t depends also on non-observable impulses of x of t for this step. The observed signal h of t, then can be obtained from the internal state a of t of the recurrent neuron by applying some non-linear transformation on top of it as shown in the second formula. Now the dependence on the previous value of the neuron creates a backward looking feedback loop for the recurrent neuron. And this is shown in the left-hand side of this diagram. We can also unroll this picture in time by drawing it as a sequence of states and inputs for different time steps. In this way we obtain the picture on the right. This means that a recurrent neuron can be viewed as a sequence of neurons for all time stamps, which take inputs x of t for this step. As well as the previous variable itself to produce and observe the output h of t. If there are k steps in a sequence, it would be equivalent to having k such neurons sequentially. But all of them will have the same parameters Wx, Wa, and b. So the dynamics for a recurrent neuron is Markov in the pair a of t and x of t. That is the next value if this pair depends only on it's previous value. But if you only look at the inputs x of t and outputs h of t, the dynamics may appear no Markov. So the main idea here is that again, the hidden state a of t accumulates all relevant information about the system at time t. Unlike state space models, or human Markov models, recurrent neuron, can remember the past information role beyond one time stamp. At least in theory, the number of past steps it can account for is unlimited. Though in practice, it's hard to work with more than a couple of thousands of steps with the simplistic or neuron, or a recurrent neural networks that we will go into next. So what are Recurrent Neural Networks, or RNN? Well as usual, a network is obtained by making layers of artificial neurons with each layer containing a few such neurons. In a similar way, a Recurrent Neural Network is obtained by making cells composed of recurrent neurons. Let's say we put k such recurrent neurons in this cell. Their activations or states, a of t can now be labeled by an upper script K. Respectively for each recurrent neuron in such cell, we will have it's own parameters, Wx, Wa and b. So we give upper scripts to these parameters as well. An output of such cell is obtained as usual for neural networks by taking a linear combination of the states a of k, add in a bias from b and applying the nonlinear transformation f on top of it. A recurrent neuron and recurrent neural network obtained by making cells of recurrent neurons is only one possible approach to modelling sequential data with neural networks. In practice it works reasonably well, although only for a sufficiently small number of steps. Usually around ten steps as we just said before. But if you want to capture more steps in the history, you may want to use a more sophisticated neuron called a Long-Short-Term Memory, or LSTM cell. Such neuron or cell is shown on this diagram. It has a cell state, who's current state is determined by its current inputs and by additional non-linear transformations of data and previous cell states called gates. An LSTM neuron has two such gates. An Input Gate, and a Forget Gate. Both of them can be thought of as additional neurons inside LSTM cell that take inputs and previous self states. Apply weights and a bias to these inputs and then apply non-linear transformations to the result. The outputs of such gates are used as additional inputs to control the cell state. After that, the produced state of the cell is passed through one more gate called the Output Gate. It transforms the cell's state by applying some non-linear function to it, which also produces the output of a LSTM cell. Such neuron has a number of adjustable parameters to describe the cell state and it's gates. Due to the presence of gates a LSTM cell can filter, forget and transform the input data, and therefore is much richer than that plain vanilla recurrent neuron. As a result of a stem neural networks, typically it work better than the current neural networks in handling longer term memory effects. In practice, they're often used to take into account tens or even hundreds of steps. An example of relevance of RNNs and LSTMs for finance is the problem of prediction future values of some fundamental variables. Such as earnings per share, for example, or EPS, from the past fundamental data. In our second week, we looked into this problem using linear regression and feed forward neural networks. Both of these approaches allow you to take some fixed and small number of past values as predictors. But an RNN or a LSTM can keep track of a variable number of past steps depending on their adjustable parameters. So the model can decide on it's own, how many time steps in the past it should explicitly take into account in order to make good predictions. So we can use RNN or LSTM to predict for example the next quarter EPS from the past fundamental data. By taking the cell output for the current quarter as a predictor of the next quarter EPS, and train the model respectively. Another example of such construction could be RNN or LSTM model to predict a bank failure within a year from the past imported data. In the second week of our course we looked at how such problem can be solved with logistic regression and T4 neural network, that both can only incorporate a fixed and low number of predictors. In general, the horizon for a prediction with recurrent neural networks can be different from one step. Alternatively we can have not just one, but many outputs in a RNN or LSTM. Different types of modeling in this context, called a sequence to sequence modeling, are schematically shown at this diagram at the bottom of the slide. I have borrowed this diagram from a blog on recurrent neural networks by Andrej Karpathy. Which I recommend you to read to understand the actual differences between different neural architectures pointing to these different types of sequence to sequence models. But for here, this brief overview of sequence models, dynamic light and variable models. Including state space models, HMMs and neural models for sequential data was all we needed to introduce the second main topic of this week, which is reinforcement learning. This will be the main theme of our second lesson for this week. So let's move on to this final lesson of the course.