[MUSIC]

In the previous video, we discussed that it

can be nice to build a Markov chain to sample from sum distribution.

But how to build such Markov chain?

A simplest method for that is called Gibbs sampling.

Involves as follows.

So say you have a distribution, a three-dimensional one,

which we know up to normalization cost.

So we can compute this probability or

density in the continuous case at any given point, but up to normalization.

So the Gibbs sampling works as follows.

First of all, we start with some initial point, for example, all 0s.

And it doesn't matter where we start.

Well, it does matter for some convergence properties, and

stuff like that, but it doesn't matter for the sake of correctness.

And then, as we're starting to build our next iteration point,

we're going to build this next iteration point one dimension at a time.

So the first dimension of the next point will be a sample from

conditional distribution, on x1, given the varies of x2 and

x3, which we had from our initialization.

And you may ask, how can we sample from this conditional distribution?

It's also a hard problem, right?

But note that the conditional distribution is proportional to the joint distribution.

So we know it, again, up to normalization constant.

And sampling from one-dimensional distributions,

it's usually much easier of it, than something from my multidimensional one.

So we assume that we can do it.

In some particular practical cases we can invent a really efficient scheme for

analytically sampling from these kind of one-dimensional distributions.

In others, we have to use general purpose schemes, like rejection sampling.

But anyway, one-dimensional sampling is easy.

So let's go to the next sub-step.

On the next sub-step,

we're going to generate the second dimension of our first duration point.

And it's going to be generated from the conditional distribution on x2,

given x1 being the dimension we have just generated,

and x3 being the initialization point dimension.

And finally, we will generate the third dimension in the same fashion.

So we'll generate it from the conditional distribution,

given the stuff we have just generated.

And in the general case, this was the iteration one.

And these three sub-steps constitutes into just one

kind of big step of our Markov chain.

So it's a way we transition from x0 to x1.

And if you write it down for

general iteration number k, it will look like this.

So the point from duration k + 1 is generated one-dimension at a time,

by using some conditional distributions, which are conditioned on the stuff

we have from the previous duration, and from the current duration.

And note that this schema's really not parallel.

So each step depends on the previous steps to be executed.

And this means that no matter how many computers we have,

if we have a 1 million-dimensional space,

we'll have to sample these dimensions one at a time, which is kind of a downside.

But we'll discuss later how can we avoid it.

Anyway, this procedure gives you a way to move from x0 to x1, and to x2, and etc.

And let's prove that this indeed defines a Markov chain,

which converged to the distribution p.

So if we sample like this long enough,

we'll get samples from the desired distribution.

And we are going to do it on the blackboard.