0:01

We previously used the variable elimination algorithm as a way of proving

the correctness of the clique tree algorithm.

It turns out that this relationship can also be used to answer another

fundamental question regarding clique trees, which is the question of where

clique trees come from or rather how we might construct a good clique tree.

In order to do that, what we're going to do is we're going to is we're going to

look at a run of variable elimination and we're going to see how we might

reinterpret it as a clique tree as a clique tree variable.

So, in order to do that let's look at the variable elimination algorithm and try

and take a clique tree view of what it's doing.

So the variable elimination algorithm, as a reminder, starts by takes a series of

steps, where in each step it creates a large factor, which for the purposes of

this discussion we're going to call lambda i.

And what it does and it does that by taking a bunch of factors and multiplying

them together. When that is done we eliminate the

variable from lambda i and that gives us a new, smaller scope factor which we're

going to call tau i And tau i is put into the pool of factors that we have

available to us which means that at some point we're going to use it in the

context of computing another large factors lambda j when we multiply a bunch

of factors together. So now let's think about this process in

terms of passing messages in in between computational entities in the same way

that we had when we were doing message passing algorithms.

So we're going to look at these intermediate factors lambda i's as the

cliques in a clique tree. And the tau i's that are used, that are

produced in the context of one clique. And utilized by a different clique. we're

going to consider that to be a message. So these little factors that are produced

by lander and consumed by another lander can be viewed as a message that's passed

from the clique that corresponds to the first lambda and received by the second

lambda So summarising the variable elimination algorithm defines a graph.

We're going to have a cluster Ci for every factor lamda, okay, we're going to

draw an edge, between Ci and Cj if the factor that was generated in the

computation on which I multiplied together all the factors, to produce

lamda i is used in the computation of lamda j.

And so we can think of it as Ci which corresponds to lambda i, passing a

message to Cj that corresponds to lambda j.

And that message is the tau i that was produced by, by the lambda i computation.

So now let's look at this in the context of an example.

We're going to go through the variable elimination process that we had for our

network that we had earlier, the enhanced student network.

And so the first variable elimination step multiplies together the C factor and

the CD factor and that gives us, this is my lambda 1.

And lambda 1 produces a clique whose scope is CD and so that's this clique

over here. The next step eliminates D, and the way

in which it does that is it takes tau one and then it multiplies as, it multiplies

that by one of my original factors phi G and so now we have a lambda 2 this is my

lambda 2. And that gives me a cleet who's scope is

GID. And since GID used tau 1,

4:03

The next step in the analysis that we did before, eliminated the variable i and

again used two of the original factors phi i and phi s.

And it also used how to. And so now, we have a factor lambda 3

whose scope is G S and I because those are the variables in this scope over

here. And the message GI is that we see here,

this edge between GID and GSI corresponds to this tau 2 that was produced by lambda

2 and consumed by lambda 3. The next elimination step didn't use any

of the prid factors produced by other, by other cliques, just used one of the

original cliques, so it just used phi h. And so note that it's currently not

connected to anything, it's just sitting there all by its lonesome.

And the next step now gets to use, gets to use factors that were produced in two

separate computations. So notice that this computation for tau,

for tau, for lambda 5 which is this guy, used both, tau 3 and tau 4 which is why

we have both an edge from cluster 3 and an edge from cluster 4.

The next step that we took marginalized out S, it's multiplied in tau 5 and one

of the original factors JLS. And, so now we have a message, here that

is passed between cluster 5 and cluster 6.

And finally there was the marginalization of L and so we have a JL cluster.

Now if you look at this graph, at this clique tree, you'll notice that it's

bigger than the clique tree that we had for this network to begin with.

And one of the, and the reason for that, is that we actually have three adjacent

cliques where one is a subset of the others.

So JL is a subset of JSL which in turn is a subset of GJSL.

Now if you think about it there's really no point to including three cliques when

one is a subset of the other because you might as well do all of the computation

in the first clique to begin with. There is no value to sort of breaking it

out into separate data structures. And so a typical post-processing step is

to go ahead and remove redundant cliques and just basically collapse them, and

their factors into into the one, the biggest clique subsumes them.

So in this case since we had 5JLS which originally utilized in tau 6, we're now

going to move that and put that instead in, sorry it was used in lambda 6 we're

now going to stick that in clique 5. And so we've seen how we can take a

variable eliminator process a run of variable elimination and use that to

produce what looks like a cleek tree. But is it a clique tree?

SO what properties can we prove about this output from a variable elimination

model? Well first is the fact that it is in fact

a tree. If we want it to be a clique tree it

better be a tree. Why is it a tree?

Because in variable elimination, every intermediate factor tau, once you produce

it, it gets consumed exactly once. Once it gets consumed, remember in the

variable elimination algorithm, it gets taken out of the set of factors and then,

it disappears because it gets multiplied in.

And so if you think about this as a directed tree, it starts out and it

produces factors and each factor feeds up only into a single factor parent and, so

as a directed tree it means that every var-, every cluster has exact, has at

most one parent. And in fact every cluster except the last

one has exactly one parent because eventually everything has to be consumed

because everything is multiplied together.

And so this variable elimination process induces a directed tree where we

subsequently forget about the directionality of the edges.

9:00

The second property that is again not difficult to show is that the tree is

family preserving. And that is because each of the original

factors was in my original factor set. Each of my original factors phi in my

original factor set big phi was somewhere in the variable elimination process.

And that means it must have been used at some point because eventually I'm

multiplying all factors together. And therefore it must be contained in the

scope of one of these. Sorry, this should be lambda i.

And that lambda i by force has a scope that contains the scope of phi,

which is what we need for the family preservation property.

10:00

The final property we had that we would like to show hold, is that the tree

respects or obeys the running intersection property.

And so as a reminder, that property says that if you have two clusters Ci and Cj

with a variable X that is present in both, then X needs to be in each cluster

and each subset in the unique path between Ci and Cj.

So first of all let's convince are selfs that, that running interception property

holds here, and we've already seen that, because this is actually the familiar

clique tree that we had for this example in fact we, produced that clique tree by

running variable elimination using exactly the process that, that I showed

you before. Now lets show that however, more

generally. So we want to prove to ourselves, that if

T is a tree of clusters that was induced by a run of variable elimination, then T

obeys the running interception property. And so now remember that this tree came

from a directed run and so what I'm showing you here is actually a directed

tree where we see how the clusters communicate this is not the same tree

that we had in our example it's a more complicated one to illustrate the sort of

to make the theorem more interesting to show.

So in one of those subsets and in exactly one of those subsets the variable x is

eliminated because this is a random variable elimination every variable is

eliminated exactly once. So let's imagine that x is eliminated

here. So this click over here is the last to

Cx, so C4 has x in it. X is in C4 and X is not in C6.

Now let's take any other cluster that contains X.

So for example assume that cluster 7 contains X.

Well, if cluster 7 contains x, and x wasn't eliminated on this edge, because

it's eliminated exactly once. It means that it needs to be on subset,

which means it was in the message tau that c7 produced.

So it's in tau 7. X is in the scope of tau 7.

Which means that x also needs to be in the scope of C3 because it was multiplied

in. If it was multiplied in then it has to be in that scope.

And so by continuing this in exactly the same inductive argument, we know that

since X has to be in C3 it also needs to be on this subset and so on.

And so we've shown that X needs to be on the path from C7 to C4.

And now let's take a different cluster that contains C5, it contains X for

example C5. We can similarly show that X needs to be

on every step along the path. And since this is a tree,

means at least to be on the path between C7 and C5.

13:42

So to summarize a run the variable eliminator implicitly defines a correct

clique tree. Which means that, you can,

if you want to get the clique tree, that has the required properties,

family preservation and running intersection.

We can simply simulate a run of variable elimination.

What does it mean to simulate? It doesn't mean that we run variable

elimination. Because that would be an expensive

process. We just figure out what factors I would

multiply and what scopes that would have. And therefore, and what factor gets used

where. And that gives me the cliques and the

connections between them. And this is the exact process that we did

in the simple example that we had. And that clique tree has a computational

cost which is essentially equivalent to the cost of running the variable

elimination algorithm. Because, if you think about the, sizes of

the clique that are produced, they correspond exactly to the sizes of, the

factors produced by variable elimination. So these costs are directly comparable to

each other. Except that as we showed before, the

clique trees use dynamic programming or storing, or caching of the messages so

that we don't need to recompute the same message more than once.

And that, allows us to produce marginals over all of the variables in the network

at only twice the cost of running variable elimination.

So, taking yet one other step backward, what that means is if you, care to get

the values of, the marginals for even not, not necessarily even all variable

but multiple variables. One efficient way of doing that is to take variable

elmination, do the simulation to figure out what clique tree we want to build or

how to build a clique tree and then use that clique tree to compute all of the

marginals by appropriate message passing. And we talked about variable elimination

and how to construct to use heuristic solutions to construct a good variable

elimination ordering one that gives us relatively small factor hopefully along

the way and that exact same set of heuristics is also used for constructing

a small efficient clique tree to the extent that that's possible given our

graph.