So this expression works both for

the case where the ratio is less than than 1 and greater than 1.

You can check it by considering two cases.

And in one case it will be just 1 because it's.

Minimum between 1 and something greater than 1.

In other case it will be wrong.

So this and by choosing this critic we indeed can prove that the overall mark of

defined by this kind of procedure will indeed.

Follow this equation.

So the distribution probably will be stationary for our Markov chain, and

that we will convert to distribution for any starting point.

Last, by choosing this critic, we may sample from the distribution y.

To summarise the metropolis Hastings approach to building

chain that converges to the desired distribution pi.

We have to start with sum of mac chain q, which we don't have anything to

do with the distribution pie and then this mark of chain on each step will

propose symbols and we'll have a correct with sometimes rejects this move.

And ask the mark of chain to stay where it was at the various straight.

And we just proved that if you defined your coreject to be like this,

if your acceptance from bill will be this expression minimum between one and

some ratio.

Then no matter from which Markov chain Q is started with,

you will necessarily converge to the desired distribution pi.

Note that this acceptance probability, it's the only place

where we use our distribution pi, which we want to sample from.

And note that we don't have to know this distribution pi exactly.

We may know it up to normalization constant.

because here, we'll have a ratio between our distribution at two different points.

It doesn't matter how we normalize this distribution.

The ratio will not care about the normalization,

because if we divide each of the parts of this

ratio by the normalization constant z, nothing will change.

So, this method, as well as the Gibbs sampling, can be used when

you don't know the normalization constant of your distribution.

We have discussed that we can choose q to be anything in distribution number one.

Well obviously, the Q should be nonzero anywhere, so

the ratio will be well-defined.

But of course, the efficiency and

the properties of your algorithm will depend on the choice of Q.

There are kind of two opposing forces here.

So first of all, Q should make large steps,

it should spread out and kind of roll freely in the sample space,

to make the samples less correlated and to converge faster.

On the other hand, if you have too large steps, your critic will reject them, too

often, and you will waste your capacity by staying at the same place for too long.

Because if you're now, for example, already converged, and

you're in the high-density region, then if you're Q proposes all this to make

too-large moves, then you will move outside your high density region, and

the will not be happy with that.

It will say I don't want to go there, the probability distribution

curve ball doesn't work, isn't high in that region.

So in the next video,

we'll see some small demo of how this approach can work in One dimensional case.

[SOUND]