0:00

The belief propagation algorithm when run over a general cluster graph is an

iterative algorithm in which messages are defined in terms of previous messages.

And we said before that when run over a general cluster graph the algorithm might

not converge, and that even if it converges it might not get to the right

answers. What we're going to talk about now are

how bad these problems are and what are some of the tricks that we use in

practice in order to improve the behavior of this algorithm in these two different

regards. So let's get some intuition by looking

back at our misconception example. This is not actually a mis-conception

example. Its one where we modify some the

specifically, the potential 51AB to make the behavior even more extreme by making

these potentials larger so that there is a stronger push to agreement between A

and B. So A and B, really, really prefer to

agreement. And what happens in this example?

we can see this beautiful oscillatory behavior of the belief propagation

algorithm, so the X axis is iterations of BP, and this is some notion of distance

1:20

And why do you have such strong oscillations?

Let's look at what these potentials are doing.

So this potential over here pushes A and B to agreement.

And so as you pass this message A and B one to agree, A and B also want to agree.

B and C want to agree, but C and B really want to disagree.

And so, as messages are passed, they're, you're getting conflicting messages from

both sides. D, for example, on the C side,

C keeps pulling at it to agree with it, sorry disagree with it.

On the other hand, when you go all the way around the loop.

C is actually urging D to agree with it. And so, from the one side you're getting

a pull for C and D to be the same, and from the other side you're getting a pull

for them to be different. And this sort of this conflict over the

cycle over the loop causes this oscillatory behavior as messages are

passed from one side or the other. This type of configuration of

specifically. height loops,

2:50

strong potentials, and conflicting directions is probably the single worst

example for the single worst scenario for belief propagation.

This is the case where it's going to do the worst in terms of both convergence

and in terms of the accuracy of the results that are obtained.

Now in this example, ultimately you actually do get convergence.

You can see it, takes about 300 iterations but ultimately you get

convergence, 300 is a lot for graph this size.

Here is one that up to 500 you didn't get convergence at all, maybe if you ran it

for 10,000 it might converge, but it's hard to say.

And, so how do we improve the convergence as well as the accuracy property of the

network? So first let's look at what not to do.

Here is the most important thing not to do.

Which is a variance of belief propagation called synchronous belief propagation.

In synchronous belief propagation, which was actually one of the most commonly

used variants at the very beginning of of the use of, of the BP algorithm.

All messages are updated in parallel. So all of the processors wake up.

So, all of the cle, all the clusters wake up.

They look at all of their incoming messages.

And they compute all of the outgoing messages, all at once.

Now, this is a great algorithm, from the perspective of simplicity of

implementation, and also from the ability to parallelize, because you could assign

a processor to each of the cluster, and they're all working in parallel and

there's no dependencies between them. Unfortunately, synchronous belief

propagation is actually not a very good algorithm.

And so what you see here is the number of messages that are have converged as a

function of the amount of time spent. This is an icing grid with certain

parameters, it doesn't matter. And you can see that you know, there is a

certain improvement over time and then it kind of asymptotes, at the certain number

of messages that have converged. By contrast, let's compare that behavior

to asynchronous belief propagation, where messages are updated one at a time.

So, this one and that one. Now notice that this algorithm is poorly

specified because I didn't tell you what order we should do the updates in and

we'll come back to that in a minute. But by, even by the simple virtue of

passing messages in an asynchronous way, the behavior improves both in terms of

how quickly we get messages to converge, and in terms of the number of messages

that are converged. Now, this is not a particularly good

message passing order. Here's a much better message passing

order. It takes a little bit longer to get

certain messages to converge. It's trying to make sure that everything

is converging. But notice that eventually and in, in not

that long of a time it actually converges to 100% convergence rate,

okay? So, here's some important observation

regarding this. First of all, convergence is a local

property and belief propagation. Some messages converge quite quickly,

others might never converge. And when you have some messages that

after you run the algorithm for certain amount of time, don't converge, one often

simply just stop the algorithm and says you know, this is what we have and we're

done. Synchronous belief propagation has

considerably worse convergence properties than asynchronous belief propagation.

Which is why pretty much, at this point very few people actually ever use

synchronous belief propagation. And, if we're using asynchronous belief

propagation, the order in which messages are passed makes a difference both to the

rate of convergence, but even more surprisingly perhaps, to the actual

extent to which messages are converged ever converged.

And so how do we pick the message passing order?

There's two there's several different scheduling algorithms.

I'm going to present two of the more popular ones.

The first called, tree reparametrization or TRP, for short.

And what it does is, it picks a tree, and then passes messages in that tree in the

same way that you would do in a creek tree algorithm.

To sort of calibrate that tree keeping all other messages fixed.

So for example, we might start by calibrating the red tree,

which means I pass messages this way and then back in the other direction,

9:27

That's one message scheduling algorithm. The second message scheduling algorithm

is something that's called the dual belief propagation and there's actually

several variance of that now. And what it does is it tries to look for

good messages, messages that are high value messages.

So it looks for two clusters whose beliefs disagree a lot.

And if they disagree that means that if I pass that message, that's going to

potentially have a large impact on the receiving clique cluster rather.

And so I'm going to, look I'm going to keep a priority queue messa- of, of

edges. And I'm going to pick the messages from

the top of the queue, based on how much I think they're going to effect, how much

of an effect they're going to have. Which is some kind of huberistic motion,

potentially of a disagreement between the two adjacent clusters.

10:35

The other big important Trick that people use to fool the

convergence of the belief propagation algorithm is what's called smoothing or

sometimes damping of messages. And this is a general trick that's often

used to reduce oscillations in dynamic systems that are based on these kinds of

fixed point equations where the left hand side is defined in terms of the right

hand side. So here we define delta IJ in the

original belief propagation algorithm as a function of other deltas.

And we've seen that can give rise to oscillatory behavior.

This is just the standard belief propagation message.

And what we're going to do now is instead of that I'm going to do a kind of smooth

version. So I'm not going to let the message

change too drastically. I'm going to have a weighted average

where the weight is lambda of the new message,

which is this thing and the old message. And that turns out to again damping

oscillations in the system and increase the chances of it converging.

11:51

So let's look at some example behaviors so here we see

synchronous belief propagation in red, over here.

And this is the percentage or fraction of message converged.

So, one would be perfect convergence. We can see that synchronous plateaus at

about 25, 20 to 25% of messages converged.

Which is pathetic enough to be pretty much useless in practice.

If only 20% of your messages converged. Then really the remaining messages don't

really mean very much. without smoothing is the green line, this

one. And we can see that this is asynchronous,

asynchronous without smoothing. We can see that, even without smoothing

the asynchronous algorithm achieves considerably better performance than the

synchronous algorithm. And it even shows synchronous without

smoothing, because it would be even worse.

And then finally, the blue line is asynchronous with smoothing.

13:03

And, you can look at behaviors of individual messages, or individual

marginals, rather, individual beliefs. So here, this is still an icing grid, by

the way. So icing grid,

which is sort of standard set up for trying out convergence sort of, of

various approximate inference algorithm. And so what you see here, the line here,

the black line is the proof marginal.

And you can see that synchronous belief propagation basically flip flops around

it in, indefinitely, whereas asynchronous belief propagation converges pretty

quickly and very close to the right answers.

Not always, this is a different variable in the same

model. The black line, again, is the true.

And here, we can see that, sure enough, asynchronous does improve convergence.

But convergence is still to an answer that's not quite right.

And you see that behavior you see both behaviors in real networks.

And, in practice depending on all of the, all of the factors that I mentioned at

the very beginning. How tight the loops are.

How strong or peaked the potentials. those are going to determine, how many of

the factors have this oscillatory behavior, and how wrong the answers are.

14:37

So as we, so just summarizing there's two main tricks for achieving BP convergence

damping or smoothing and intelligent message passing.

convergence is often easier to obtain using these tricks than correctness and

convergence unfortunately doesn't guarantee correctness.

And the bad cases, the ones that negatively impact both of these are

strong potentials that are pulling you in different directions and tight loops.

And we're not going to have time to go into all of these into all of these

topics, but there's some new algorithms that actually have considerably better

behavior both in terms of convergence and in terms of the accuracy of the results.

And that is based on a view of inference as optimizing some distance to the true

distribution. And there's some really cool ideas as

well as math in this but much of this is outside the scope of this class.