So we spoke in general about different approaches to modeling the LOB.
We said that there are few classes of approaches that can
be classified as economics based single agent models,
statistical models, physics motivated models and machine learning approaches.
So, it's needed for
many different things which may be estimation of different measures of intraday risk.
For example, estimation of daily volatility, called VaR.
Another objective of modeling here is
a very short term prediction about a second into a future of price moves,
which is used for designing of trading strategies.
Yet another objective, is
an optimal trade execution that we already mentioned a few times before.
So, in this video,
I would like to talk about statistical approaches.
The main reference for this video is a paper by Rama Cont cited on this slide.
In general term, statistical models of the LOB incorporate two sorts of information.
The first sort of information is the current state of the LOB.
The second piece of information are various statistics of the order flow.
For example, arrival rates of market and limit orders.
As we just said before,
different types of approaches can be used to this end.
They can vary from purely statistical to machine learning to physics-based models.
But most of them share many features in common.
Most of them view arrivals of different types of orders such as buy/sell,
limit and cancel orders probabilistically.
Those probabilities per unit time are called arrival intensities.
The other thing this models should include
are execution of market rates via priority rules.
All this produces model predicted price dynamics that can be compared with the data.
This still leaves lots of freedom regarding how exactly this should be done.
One approach that can be called a microscopic
approach would be to model all price levels simultaneously.
Another possible approach would be to focus instead
on explicit modelling of only the best prices on both sides.
In other words, this approach is more fundamental and logical as it
focuses on the part of the LOB where the trade actually takes place.
Characteristics or father price levels in
LOB are used here as a source of different features,
if we use here the language of machine learning.
There are also some interesting physics based approaches of this sort.
For example, you can see papers by Doyne Farmer,
and Jean-Philippe Bouchaud, and their coworkers on these topics that I already mentioned.
So, one possible approach is to model the LOB as a multi-level queuing system.
Remember that we talked about priority rules in price and in time.
They implement the fire fall or first-in first-out principle of a queue.
So a simplest idea for modeling in LOB as
multi-level queue would be to first break this into independent queues.
If we assume that for
each price level arrival rates for different orders are mutually independent,
we can build a simple and tractable models.
For example, we can use Poisson processes to describe
arrival of events such as a limit or market orders.
This produces simple models because for Poisson processes durations
between consecutive events are independent and distributed exponentially.
So, in this picture,
events of crossing this spread would appear as Poisson events.
Their intensity would depend on the state and statistics of
the LOB and assumes stationarity after [inaudible] these data,
we can estimate these intensities as parametric functions.
All this will produce simple and tractable analytical model.
So far, so good except that these models would be too simplistic.
In reality, the main assumption of Poisson processes
is strongly violated in data for this case.
As we just said,
the key assumption or four models that use Poisson distribution
is that inter event durations, they're independent.
But in reality, arrivals tend to cluster in time.
In this graph, you see changes in the size of
the ask queue for the Citigroup stock on June 26.
In 2016, Rama quote on paper, 2008.
Each order or cancellation is a data point.
If the value is positive,
it's a limit order and if the value is negative,
it's a market order or cancellation.
The frequent large spikes, and oscillates,
and amplitude that you can see here are all signs of clustering in time.
Therefore, the standard Poisson modelling is not adequate here.
A big question of course is what causes these clustering?
[inaudible] believes that such clustering can have endogenous origin.
This means that it would not be driven by some sort of external signals,
but rather by some internal dynamics.
There are many very interesting questions that
can be addressed if you have a limit order book data.
One very interesting topic is the problem of marketing impact.
We already talked about market impact and
mentioned the classical work on this topic by Bertsimas and Lo,
and Almgren and Chriss.
We also talked about how these ideas can be
taken to reinforcement learning in
the last week of our course on the reinforcement learning.
Now this classical papers deal with
price impact of actual traits that is of a market orders.
But there are tons of different specification of marketing book models in
the literature and there are
many different definitions of what exactly the marketing practice.
There are, for example,
a permanent versus transient impacts,
linear versus non-linear models and so on.
Some of these approaches seemed not to be quiet consistent and this was noted by
Rogers and Singh in their paper from 2006 on Liquidity Modeling.
They remarked that the notion of permanent impact of an individual trait is somewhat
problematic because if there is
an individual permanent impact then other traits should also have permanent impact.
But if this is true,
then it's not clear what is modeled in such approaches in the first place.
But this is just a side remark and the point I
wanted to highlight here is rather different.
The point is that price impact in
the classical models is modeled as a function of traits,
that is function of market orders.
But as we already know,
a vast majority of events in a limit order book are limit orders and cancellations.
The rate of cancelled orders often reaches 90 or even 95 percent.
Therefore, if we simply discard this events we might lose some important information.
Now, another paper by Cont and coworkers drew
attention to importance of limit order book events for price impact.
So, they showed that price changes are sensitive to a measure of order flow imbalance.
This measure is given by the difference between the limit orders minus
market orders and minus cancellations and divide it by the depth of the price level.
So once we identify this metric as a driver of price changes,
we can for example formulate simple Langevin type dynamics for such system.
For the lack of time and space,
we will not go into details of this and
some other very interesting statistical and physics-based model of the LOB,
but I recommend you looking into these papers on their own
including papers by [inaudible] Bouchaud,
Farmers, and others if you want to dig deeper into this topic.