Welcome to the third week of our course on reinforcement learning and finance.

This week is going to be very interesting,

as in this week we will leave a model based approach of dynamic programming and move to

data driven and model independent ways to

solve our problem of optimal option pricing and hedging.

Now, I need to specify what they mean by model independence here,

given that we still discuss a model for a mark of decision process.

What is meant in this context by model independence is that in a data driven setting,

we will only keep the general structure of the model but we do

not assume that transition probabilities and the whole functions are known.

The traditional paradigm of quantitative finance is that,

in order to optimally price and hedge financial losses,

we first have to build an estimate a model of the world.

But reinforcement learning paradigm is different and it allows us to

focus directly on our prime goal of optimal control.

Depending on the particular algorithm that we use,

the task may or may not involve the problem of building the model of the world first.

Methods that we going to discuss in this week,

actually let us do things in the model independent way.

We will start with a look at

batch-model reinforcement learning and then we will introduce Q-Learning,

one of the most famous algorithms of reinforcement learning.

After presenting the basic version of the algorithm,

we will discuss its more practical version called Fitted Q Iteration.

All this will be presented not in an abstract way but rather

directly within our MGP model of option pricing.

So, we will immediately see how they work in the financial setting.

In your homework for this week,

you will implement these solutions and see how they work in practice.

So let's get started.