Now let's look more qualitatively at the effect of the predictions, on a next

instance, after seeing certain amounts of data.

And for the moment, we're going to assume that the ratio between the number of 1s

and the number of 0s is fixed, so that we have one 1 for every four 0.

And that's the data that we are getting. And now let's see what happens as a

function of the sample size. So as we get more and more data, all of

which satisfy this particular ratio. So here we're playing around with a

different strength, our equivalent sample size but we're fixing the ratio of alpha

one to alpha zero to represent in this case the 50% level.

So our prior is a uniform fire but of greater and greater changing strength.

And so this little green line down at the bottom represents a low alpha.

Because we can see that the data gets pulled our, posterior.

So sorry. The line is drawing the posterior over on

the parameter or rather equivalency, the prediction of the next data instance over

time. And you can see here that alpha is low

and that means that even for fairly small amounts of data say twenty data points

are fairly close to the data estimates. On the other hand, this bluish line here

We can see that the alpha is high. And that means it takes more time for the

data to pull us, to the empirical fraction of heads versus tails.

Now let's look at varying the other parameter.

We're going to now fix the equivalent sample size.

And we're going to just start out with different prior.

And we can see that now we get pulled down to the 0.2 value that we see in the,

in the empirical data. and the further away from it.

We start, though. It takes us a little bit longer to

actually get pulled down to the data estimate.

But in all cases, we eventually get convergence to the value in the actual

data set. But, from a pragmatic perspective it

turns out that Bayesian estimates provide us with a smoothness where the random

fluctuations in the data don't don't cause quite as much random jumping

around as they do for example in maximum likelihood estimates.

So if what we have here is the actual value of the coin toss at different

points in the process, you can see that the blue line, this

light blue line corresponds to maximum likely data estimation basically bops

around the pheromone, especially in the low data regime.

Whereas the ones that use a prior, estimate to be the prior are considerably

smoother, and less subject to random noise.

In summary, Bayesian prediction combines two types of, you might call them

sufficient statistics. There is the sufficient statistics from

the real data. But there's also sufficient statistics,

from the imaginary samples, that, contribute, eh, to the derscht laid

distribution, these alpha hyper parameters, and the basion prediction

effectively makes the prediction about the new data instance by combining both

of these. Now, as the amount of data increases,

that is, at the asymptotic limit of many beta instances.

The term that corresponds to the real data samples is going to dominate.

And therefore, the prior is going to become vanishingly small in terms of the

contribution that it makes. So at the limit, the Bayesian prediction

is the same as maximum likelihood destination.