0:02

So previously we defined the Clique Tree Algorithm which is just a belief

propagation algorithm run over a graph that happens to be a tree.

As it turns out we can exploit the aspects of the tree to greatly improve

the computational behavior of this algorithm, not just it's correctness

which is what we showed before. So let's go back for a message passing in

this very simple tree which has just these four clusters A, B, B, C, C, D, and

E and let's see what the messages do. So here is the message delta one two

which is the message that past on this edge and here's a message delta two

three, and so on.

And these are the six messages, that we have.

And now we can notice one very important property.

Delta one, two once computed, never changes.

1:10

You get instant convergence of Delta one two, once you complete it for the very

first time it is convergence. Now what about delta 2,3.

Well that depends on when delta 2,3 is passed.

If delta 2,3 is passed first before it gets before cluster two gets a message

from cluster one then once it gets that message things might change but if

cluster two is clever it's going to wait and it waits until that message is passed

and once that message is passed it doesn't change delta 2,3 doesn't change

either ever. If its smart enough to wait.

What about delta three four? Delta three four needs to wait a little

bit longer because it needs to wait for delta two three.

But if it waits for both delta one two to be passed to cluster two and for delta

three to be passed to cluster three, then now it's now that it's sent now that it

sends delta three four, delta three four doesn't change either.

So delta two three has to wait. Or, delta one two and delta three four

has to wait for delta two three but assuming it waits then convergence is

achieved in a single message passing style.

Now there is nothing magical about going left to right if you go right to left you

get the exact same behavior, so here you have the this one converges instantly.

2:58

And this one you need to wait or delta B2. But again, once assuming you wait

then, the messages converge in a single step, and so we can compute all of the

messages in this entire tree, in, one paths in either direction, one paths from

left to right, and one paths from right to left.

In the context of chain structures, like this one, this is actually, this actually

has its own name, so for chains. You might see this under the name forward

backward algorithm. And is very commonly used in, things like

the hidden Markov model and and other similar chain structured,

representations. And so the point is, the total effort

that we had to compute all of these messages is just one message passing step

in each direction. one from left to right, and one from

right to left. And the result of that, remember that

once all messages have been passed, we have beliefs.

4:54

And, so more generally as we said, once CI receives a final message from all of

its neighbors, except for CJ. If it waits, to send, if it waits to send

the message to J until it receives a message from everybody else, than that

message delta IJ is also final. So if these are final, this is also

final. Can you always find a message passing

order that will achieve that goal I mean maybe you kind of everybody ends up

waiting for everybody else and they all get stuck and they never send anybody any

messages. The point is that you could always start,

because the message from a leaf is always immediately final.

And once the message from a leaf is finalized, then you can send the message

from the parents of the leaf, and so on and so forth.

And because this is a tree, it's guaranteed that you will be able to find

a legal message passing order. And if we pass the messages in the right

order, then we only need to pass 2K minus 1 messages, where K is the total number

6:09

of clusters. And why is that?

Because if you have a tree over K nodes, in this case K clusters, then there's K

minus one edges. And since each edge has a message in both

directions, so we have, we have K minus one messages.

I'm sorry, K minus one edges. And therefore two A minus one messages,

one in each direction. Let's look at a couple of message passing

orders to see which ones are going to work and which ones aren't.

So we can start with any leaf, any leaf works so say we start with oh, I don't

know C two? Once C two past its message what other

clut, what other clique can pass this message.

Well see we can't pass messages yet because its still waiting for messages

from all sorts of other guys so we have to go to another leaf for example C1, C1

can pass a message now. Neither C3, nor C4 are are candidates for

message passing yet cause each of them is still waiting for another message.

But C6 can pass a message and now we have the option of activating C4, C4 can now

send a message to C3, because we've got everything except the message from c3

before. C7 can say pass a message to C3 and you

notice there is a lot of arbitrary decisions on how I ordered this I could

have used C five next. And now C three has the option of passing

a message to C5. And notice that at this point, everyone

has received all of the messages, or all of the messages but one.

And so now we can start passing in the other direction.

So C5 can pass a message to c3 C3 can pass a message to any of C2, C7,

or C4. And at this point, C4 can pass every,

any, can pass a message to both C6 and C1.

8:36

What is an example of an illegal message passing order?

So for example if c1 passes the message to C4 and C4 jumps the gun and says yey

I'm going to send the message to C3 next that is an illegal message passing order

because C4 does not yet have all the information that it needs to pass the

message to C3, it's still waiting for a message from C6.

So that's not a good ordering. Clique trees have other very elegant

computation properties that make them a useful data structure.

So, let's think about the kind of queries that one can answer using clique trees

and some of these are fairly obvious, but others are a little bit more subtle.

So, imagine that we wanted to answer a posterior distribution query on either a

single variable or a set of variables that appear together in a clique.

Well, in this case you can take any clique that contains all of those

variables. And sum out the variable that you don't

care about from the clique to get a posterior over just those variables.

Now remember that, that posterior is P tilda Pi, it's an unnormalized measure,

so in order to get the normalized measure you actually need to renormalize.

9:58

Now, you could also do something a little bit more clever.

You could say, you wanted to introduce new evidence, regarding some variable and

query another variable. So this is a form of incremental

10:18

inference. Where you've already calibrated the

cligue tree and you say now oh, wait a minute I made another observation let's

see what happens now. So it turns out that clique trees are

really good for that. and so we're going to divide this into

two cases. The easy case and the slightly less easy

case. So, imagine that we want that question to

query a variable X that appeared in the clique with my new evidency.

So here is my clique, and here is z and here is X,

and I'm going to, I want to figure out what the posterior of X would be if I

further observe Z. Well this is actually really easy because

what I have here is P tilde Pi. lets assume for the moment just Z and X

if there's other variables there its not it doesn't change anything because we can

just get rid of them we can now condition or reduce the clique so this is a cligue

reduction by just restricting attention to the entry's that are consistent with

my evidence to be. And then, I end up with P tilde Pi,

11:44

of little V,X and in order to get a posterior, I can sum out the relevant

variables in the normalized the incremental inference.

Well that's the easy case if the two variables are together in a clique.

what about a somewhat more elaborate case where I introduce the new evidence and I

query a variable that doesn't appear in the same variable, in the same clique as

Z. So using an example, let's imagine that

what I did is observe evidence on A, and I want to query, say E.

Actually, let's do a different example. Let's imagine that I want to query B.

12:30

Well, in this case I can do the following.

I can multiply the this indicator function or

what we call this reduction of the factor into a cluster that contains the

evidence. In this case, we're going to multiply in

the indicator function of A equals little A.

Which corresponds to removing all entries in cluster one, in the beliefs of cluster

one that are not consistent with with

multiply it into the side one, removing all entries that are inconsistent with

that. So now, what happens when we change psi

one. Imagine that we're doing this entire

computation from scratch. That is, forget everything that we did

before. And think what would happen to this model

in the hypothetical case that we were passing messages with a new Psy one

instead of the old si one. Well, some messages would change.

Which messages would change? Delta 1,2 would change, because Psy one

changed. Delta two three would change because

because delta two one changed. delta three four would change, but we

don't care about delta three four, if we're interested in this belief over

here. What about other messages?

Delta 4-3 doesn't change, eeither does delta three two.

Neither, neither hid delta two one, and so only a sub set of the messages in the

clique tree changs as a result of this message of, of, this of this change,

which means that we can reuse, at least half the messages if not more, in

computing, are beliefs about in this case cluster three.

And the only messages that we need to recompute are the messages that are along

the path through a clique that contains the variable X that we want to

[INAUDIBLE] which in this case is cluster three.

15:07

So to summarize if we have a cliqque tree with K cliques and we pass messages in a

careful order which means that we start at the leaf and propagate inwards then we

can construct and ordering subset 2K minus one that's to justify compute all

of the beliefs in in the in the clique tree which means we get a posterior over

every single variable in the model. Contrast that with the computational cost

of running variable elimination and times to compute posteriors and n different

variables. Here, you get a considerable

computational savings. In fact a, a, a computational savings of

order, order of the number of variables. And so what we can do is we can compute

all the marginals at only twice the cost of variable elimination as opposed to N

times the cost of variable elmination. And we've also shown that by caching,

restoring the messages, we can reuse inference for example, in the context of

incremental queries. Which provides us with the useful data

structure to keep updating, as we obtain additional evidence.