0:00

We've already said that there's many different algorithms for inferencing

graphical models, but the simplest and most fundamental is an algorithm

typically known as variable elimination. Let's consider the variable elimination

algorithm in the context of a simple example of a graphical model structured

as a chain. Here we have we're interested maybe in

computing the distribution over variable E.

So we're interested in P(E). And as we've already said that

probability is proportional to the un-normalized measure p tilda over A B C

D and E summing up all of the variables except for E.

So now let's see how we can do this more efficiently than simply constructing the

joint distribution and then summing things out.

So the first thing we do is we write up this unnormalized measure as a product of

the constituent factors. And for the moment we're going to assume

that we only have pairwise factors for the edges in, this graph.

So we have a factor for AB, a factor for BC, CD, and D.

So those are the factors phi one up to phi four.

Now, what is the first observation that we have when we see the summation over A

B C and D of a product of factors? Well, we've already done this exercise

previously when we were doing some proofs related to graphical models.

That if we have a factor that doesn't mention a particular variable in its

scope we can move it out of the scope for the summation.

So specifically, phi two of BC can be moved out of the summation over A, as can

phi three of CD and phi four of DE, which leaves us only with the summation over A

of phi one AB. So that gives us the expression over

here. Now this is.

Now this is a summation over a para-wise factor.

And the result of this is a factor over a single variable B. which we're going to

call tau one of B. And so we end up with an expression that

looks like this. So now let's continue this expression the

developing this expression further. noting that we now have an expression

that doesn't involve A, only the variables B, C, D and E, so that

effectively, we have eliminated E from our graph, A, we've eliminated A from our

graphical model. So let's go back to this expression, we

now have this, this product of four factors and once again we can look at

what factors involve the variable B and which ones don't.

And the ones that don't can be moved out of the summation, just as before, giving

us this where now we have an expression which is a product of these two factors.

Sum that over B. And this is going to give us an

expression. How to, whose scope is C.

And so we now have an expression that does not involve the variable B.

And so now we've eliminated B from this graphical model.

And we can similarly continue to eliminate C and D.

So that, ultimately, we will end up with an expression that involves only the

variable E. And that expression is going to be

proportional to the probability of the, to the marginal probability of E.

Now let's do this algorithm in the context of a somewhat more complicated

example which is our enhanced student network that we've played around with

before. So let's imagine that our goal is to

compute the probability of a variable J as one, and in order to do that we're

going to have to eliminate some of the joint distribution all of the other

variables except for J. So this our, our expression.

And note that we have this product of factors.

I've taken, already, in this expression, the factors that were, the CPDs, and

turned them into factors so that we can have a consistent notation.

And now we need to eliminate every one of the variables except for J.

So we're going to start with eliminating the variable C first.

And and so once again we are going to take the summation over C and, we are

going to push it in, leaving in the summation only the factors that involve

C. And those are phi D and phi C.

Multiplying them together and eliminating C is going to give us a factor which

we're going to call tau one, and whose scope is D.

4:34

Okay? And, by putting tau one into this

expression and removing the ones that we've just multiplied together, we end up

with this expression over here. Having eliminated C, we now go ahead and

eliminate D. So, here we have the variable, D.

And which factor is, involve D? We'll tau one of D.

And phi G of G, I, and D. And so everything else is taken out of

the summation. And we just have this expression over

here. And that's going to give us tau two whose

scope is G and I after having eliminated the variable D, from this product whose

scope is G, I, and D. And tau two gets put back into the bucket

together with everything else. Moving forward, we're now interested in

eliminating I. And so the factors that involve I are tau

two of G and I and phi S of S and I, and phi I of I, and so, we go ahead and

multiply them together to give us a factor whose scope is G, I, S and

eliminating I gives us a factor tau three whose scope is S and G.

And so the process continues and let's just finish it all the way to the end.

Now our goal is to eliminate H. Well H is a little bit of an interesting

case. They only factor that mentions H is this

factor 5H. and if we think about what 5h is,

5h as it happens is P of H given G and J. And so if we're summing that up over H,

we're actually summing up what is, in fact, the conditional distribution.

And since we know that a conditional distribution when you sum up the, the

values on the left-hand side, the summation necessarily is equal to one.

So in principle we could have taken this entire expression.

Erased it and written one instead and that would have given us something that

is a factor that doesn't depend on anything would have been, would have just

disappeared. But for purposes of demonstration we're

not actually going to do that because in fact not every algorithm is clever enough

to notice these kinds of coincidences, it depends on whether they were designed to

look for that. And so we're going to do this in, in the

same way, in the same sort of naive way that we've done before which is just to

turn this into a factor which is going to be tau four of G.

7:38

J. So so now we have that factor which

really is one. But we're not going to pay attention to

that particular aspect for the purposes of demonstrating how the algorithm would

work. Okay?

Next is eliminating D. And we have this expression, which we

inherited from the previous line. And G is one of the big ones, because it

appears in phi L tau three and tau four.

So when we think about the variables here in the scope, we see that this one

actually has, this product over here, actually has a scope of L, G, S, J which

is the largest factor, the one with the largest scope that we've encountered so

far. Summing our G we end up with a factor

whose scope is L, S, and J. So we're missing an S.

And and now we put that into. This expression, and out comes a now

product of two factors. And really at this point we may just as

well multiply them and end up with a factor over J and.

9:09

Hold on. So that gives us variable elimination in

it's naive form. What about variable elimination with

evidence? Well, we've already basically established

how to deal with evidence. if we're interested in, do in, for

example, solving the query probability of J, I equals little i.

Comma H equals little h. The way in which we do that is by, is by,

computing the probability of the joint event, J comma I equals little i comma H

equals little h. And the way in which we do that is by

reducing the factors to correspond to this scope.

And so if these were, If, so we take each of the factors that

involved I. And we basically instantiated to take a

particular value for that I. The value, little I, and similarly, for

h. And so we see here, for example, that,

whereas phi I initially depended on i. Now it doesn't depend on anything.

Because this is simply the value phi I of little i which is a constant.

And where as for example, G depended on D and I as we can see in the original

diagram, here, phi G, in a reduced factor doesn't depend on I, and is really

probability of G, given little i and, D. And the same reduction occurs for H

equals H and so we end up with the following set, of reduced factors. And

now once we have that set of reduced factors we do the as, exactly as before.

No changes whatsoever to the algorithm. The only aspect that's a little bit

different is notice that we don't eliminate.

No H, and no I, because there's no point eliminating a, a variable that has a

single value. There's no need to sum up over it when it

only has a single value. So these give us a unified framework for

dealing with with queries whether they involve evidence or not.

And then how do we get the probability of j given the evidence?

Well, this is straight forward. We simply renormalize.

11:28

Because we, because we can take this. And simply divide by what it turns out to

be the probability, normalizing constant. Okay.

So, let's see whether the same idea applies to Markov networks.

So here is our simple Markov networks with, network with four variables.

Let's imagine that our goal is to compute P of D and so in order to that we need to

eliminate A B and C from the unnormalized measures.

So we have this being the unnormalized measure.

And we're summing up over A, B and C. And the process works in exactly the same

way. So if we want to sum out a first, then

here is, the factors that involve A. Phi one of A, B, this one.

And phi four of AD. Multiply them together.

We get a factor whose scope is ABD and then we sum out A to get a factor whose

scope is BD. That gives us a new set of factors where

A has effectively been eliminated from the graphical model.

And at the end of the process, we get a factor over the single remaining variable

D. So tau three of D and that factor is not

the probability of D, it's proportional to the probability of D.

It's actually equal to the tilda of D which is the unnormalized measure and so

in order to get P of D we renormalize. So to summarize this the main routine in

this algorithm is something is a routine which we call eliminate variable Z from a

set of factors phi, and what it does is the following.

We first look within phi, and we define a set of factors phi prime which are all

factors that involve Z. And that's what this mathematical

expression says. The factors phi I, so that Z is in their

scope. We take all those factors, and we

multiply them. And then, we sum out the variable Z,

which is the one that we want to eliminate.

Now, and here is the important point, we've already used up these factors.

These ones over here, have now been used. We don't want to reuse them.

And so we take them out of the set of factors.

And instead we introduce the one we just created by multiplying those factors, and

summing them up. This basic operation is what we use in

the context of the algorithm as a whole. we begin by reducing all factors by the

evidence, in, but which is just eliminating the, rows that don't that are

not consistent with the observations, and that is what gets us are set of factors

phi. Now for each non query variable we need

to eliminate it. And so we run one at a time,

something that eliminates the variable Z from the set of factors phi.

Each subset changes my set of factors. It adds factors and removes.

So actually starts by removing factors which we, phi prime from the previous

one, from the previous line and it adds a new factor tau.

15:33

And then finally at the very end, when all variables have been eliminated, there

may be one or more factors remaining so that point we multiply all the remaining

factors and then we normalize to get a distribution.

So, to summarize, this is a very simple algorithm.

It works equally well for Bayes nets and Markov nets.

and it uses. And it's a factor product and factor

summation steps. And it does that be ensuring that when

you mult-, that when you sum out a factor,

when you do the summation step over variable Z,

then all factors involving z have been multiplied in.

Which is the critical piece of correctness of this algorithm.