Now let's look more qualitatively at the effect of the predictions, on a next
instance, after seeing certain amounts of data.
And for the moment, we're going to assume that the ratio between the number of 1s
and the number of 0s is fixed, so that we have one 1 for every four 0.
And that's the data that we are getting. And now let's see what happens as a
function of the sample size. So as we get more and more data, all of
which satisfy this particular ratio. So here we're playing around with a
different strength, our equivalent sample size but we're fixing the ratio of alpha
one to alpha zero to represent in this case the 50% level.
So our prior is a uniform fire but of greater and greater changing strength.
And so this little green line down at the bottom represents a low alpha.
Because we can see that the data gets pulled our, posterior.
So sorry. The line is drawing the posterior over on
the parameter or rather equivalency, the prediction of the next data instance over
time. And you can see here that alpha is low
and that means that even for fairly small amounts of data say twenty data points
are fairly close to the data estimates. On the other hand, this bluish line here
We can see that the alpha is high. And that means it takes more time for the
data to pull us, to the empirical fraction of heads versus tails.
Now let's look at varying the other parameter.
We're going to now fix the equivalent sample size.
And we're going to just start out with different prior.
And we can see that now we get pulled down to the 0.2 value that we see in the,
in the empirical data. and the further away from it.
We start, though. It takes us a little bit longer to
actually get pulled down to the data estimate.
But in all cases, we eventually get convergence to the value in the actual
data set. But, from a pragmatic perspective it
turns out that Bayesian estimates provide us with a smoothness where the random
fluctuations in the data don't don't cause quite as much random jumping
around as they do for example in maximum likelihood estimates.
So if what we have here is the actual value of the coin toss at different
points in the process, you can see that the blue line, this
light blue line corresponds to maximum likely data estimation basically bops
around the pheromone, especially in the low data regime.
Whereas the ones that use a prior, estimate to be the prior are considerably
smoother, and less subject to random noise.
In summary, Bayesian prediction combines two types of, you might call them
sufficient statistics. There is the sufficient statistics from
the real data. But there's also sufficient statistics,
from the imaginary samples, that, contribute, eh, to the derscht laid
distribution, these alpha hyper parameters, and the basion prediction
effectively makes the prediction about the new data instance by combining both
of these. Now, as the amount of data increases,
that is, at the asymptotic limit of many beta instances.
The term that corresponds to the real data samples is going to dominate.
And therefore, the prior is going to become vanishingly small in terms of the
contribution that it makes. So at the limit, the Bayesian prediction
is the same as maximum likelihood destination.