This model samples the expected action, moving a pile up or

down from a distribution when no gradients are available and

high-uncertainty is in place.

We start by importing all of the needed ingredients.

We have included functions implementing the forward and

back propagation steps shown here.

The forward step generates a policy given a visual

representation of the environments toward variable x, or

the back propagation function updates the layers of the network

modulating the loss function values with discounted rewards.

This function sample action, chooses an action moving up

with the probability that is proportional to the predictive probability.

This is part of the magic behind reinforcement learning.

We are able to optimize objective functions under very

uncertain conditions using this stochastic process.

Once you run the program, you will see the algorithm in action.

It should be noted that because this algorithm is placed in scenarios with

extremely high uncertainty,

the model will take many hours before it produces meaningful results.

[MUSIC]