0:00

When we discussed Variable Elimination. We said the Variable Elimination is

correct no matter the elimination ordering that's used.

But we it also showed that the elimination ordering can make a very

significant difference in the computation complexity of the algorithm.

So that embeds the question of how we find a good elimination ordering.

Fortunately, the computational analysis based on graphs that we recently did

gives us an intuition into how to find good elimination orderings.

Because it turns out that the [INAUDIBLE] used graph and it, the way in which it.

In the case the complexity of Variable Elimination can allow us to construct

good me- methods for finding elimination orderings with good computational

properties. How good of an elimination ordering can

we construct? So, here's yet another example of the

fact that any interesting problem is empty heart.

So the following problem is empty heart also.

For graph H, determining whether there exists any elimination ordering alpha for

which the induced width in less than or equal to some fixed value K, that problem

is NP complete. So you might say to yourself, well that's

not surprising. I mean, probabilistic inferences end

incomplete so obviously finding a really good elimination ordering is going to be

NP complete. Turns out, that this is a separate NP

NP-hardness problem. That is, even if you could solve this

problem, even if somebody gave you the best elimination ordering, you're still

going to have graphs for which the width is sufficiently large so that you can't

actually solve the problem polynomial type.

So finding the best induced width, or, sorry, the elimination ordering that

gives you the best induced widths does not, give you polynomial type

performance, in general, and so these are two separate NP-hardness problems.

1:58

So how do you find a good elimination ordering? Fortunately simple heuristics

actually do pretty well. So one standard way of finding good

illumination ordering is to simply do a greedy search eliminating one variable at

a time. Where at each point in the elimination,

you've used some kind of heuristic cost function to decide which variable your

going to eliminate next. And there's some obvious cost functions

to use and it turns out that surprisingly they work pretty well.

So, for example, one cost function is what's called min-neighbors.

Which is to pick the node that has the minimal number of neighbors in the

current graph. So you want to pick the variable that's

connected to the fewest friends so it has the smallest party or produces um., the

smallest factor. So this corresponds to the smallest

factor or the one whose cardinality is smallest,

enters number of variables. Min-weight accounts for the fact that

sometimes you have variables that have 100 values and others that have two

values, and if you just count variables, you're going to ignore that distinction.

So min-weight computes the weight or the total number of values in the factor

form. So min-weight is a way of capturing the

fact that if you have two variable each of which has 100 value you might prefer

to never the less take a factor that has five variables each of which has only two

values. so these just look at the size of the

factors formed and they ignore the fact that some of these factors might be

inevitable. So a different strategy is to look at how

much extra work is caused by the elimination ordering.

So min-fill is a strategy that says, if I eliminate this node, how many nodes that

weren't friends before become friends due to this elimination step?

So here basically, I'm counting added complexity above and beyond the

complexity that I would have had from the edges that I previously generated.

And finally, again you have a weighted version of the fill heuristic that takes

into account not just the number of edges but also how heavy they are.

That is how big are the what's the car, what's the domain size of the variables

that they connect. And it turns out that min-fill which is

actually quite often used in practices a pretty good heuristic Now once important

way of understanding the issues of finding elimination orderings is by is by

looking at the following result. And the result tells us that the induced

graph that is produced by Variable Elimination, regardless of the

elimination ordering, is triangulated. So what does triangulated mean?

Triangulated means that you cannot have a loop of size more than three that doesn't

have a bridge an edge that goes in one direction or another.

So that there is a way of crossing across that loop without going all the way

around. Let's convince ourselves that this is

true. So here's here's a simple proof.

So once again assume, assume, so consider a set of variables that, and assume by

contradiction that there is a loop like this.

So for example assume that we have a loop,

outsize greater than three. And again, one of these variables has to

be the first to be eliminated. So assume that C is first to be

eliminated. Well once we eliminate C we end up

introducing an edge between B and D and so there has to be and edge between those

two neighbors of C. Because when C is eliminated the CD edge

must exist the CB edge must exist because as we observe the before you don't add

any neighbors to C once you eliminate it. And so at that point a fill, a fill edge

must be added. Though, it turns out that one way to find

an elimination ordering, an alternative to the heuristic that I mentioned

earlier, is to find a low-width triangulation of the original graph. And

there's been a lot of work in the graph theory literature on triangulating graphs

which we're not going to talk about, but it turns out that you could use all that

literature for finding good low-width triangulations and then use that for

constructing an elimination order. So now, let's convince ourselves that

this can make a big difference. So let's consider a practical application

and this is an application to Robot Localization & Mapping.

So what we're going to see here is a robot moving around and at each point in

time, it sees several landmarks, and it knows which landmarks it sees,

it can recognize them. And what it senses at each point is some

kind of approximate distance to the closest landmark.

So lets, write that down as a graphical model.

The graphical model looks something like this.

We have, at each point in time, a robot pose.

Which is the robot pose at time zero one up to the end of the trajectory T. And we

have a set of landmarks, whose positions are fixed the landmarks

don't move. So notice that these are not random

variables that change over time. They're just there, and they're, and

they're constant. And what you see here in gray are the

observations, so this, for example, assumes that at Time one,

Robot saw landmark one. And so what we have here is the

observation the observe distance is the function of the robot pose,

and the landmark position. And here at time two, we have that the

robot saw landmark two so we have this dependency over here.

And it lent at time three the robot also saw landmark two so you have two

observations for the same landmark. In here, I didn't draw the fact that you

could actually see the same landmark so you can see more than one landmark at any

at a given point in time but that also is easy to add in,

9:01

okay? So now, if we take the previous robot

trajectory that we saw and we write down the, the Markov network that represents

the factors, we see that we have these light blue little dots.

That represent the robot pose at every point in time.

We see that we have these temporal persistence edges that represent the fact

that the robot pose at time T is correlated with this position of time T1.

plus one. And we have these dark blue robot pose

variables that, sorry these dark blue landmark position

variables where we see that a landmark is connected to all of the positions at

which it was visible by the robot. And this is an edge that's induced by the

V structure that we have over here. So this is an edge that's induced by this

V structure which we have because Z1's been observed,

okay? So, does landmark, does elimination

ordering matter? Oh boy.

This is what you get when you eliminate the robots poses first, and then the

landmark. And this, what you see here, is the

induced graph. And you can see that you get this huge

spaghetti where most of the landmarks are connect to every, to all other landmarks,

okay? What about a different elimination

ordering? This one's already better.

This is what happens when you eliminate landmarks and then poses and you see the

induced graph that you get over poses. And you can see that it's still pretty

densely connected but it's not nearly as bad as the spaghetti that we had before.

And finally, let's look at the result of min-fill elimination, so this is what you

are seeing is the fill edges being constructed.

And you can see that it's actually very sparse.

Lets look at it again, very few fill edges are actually added

over the course of the trajectory. And so there is, so the elimination

ordering makes a very big difference. In summary, we've seen that finding the

optimal elimination ordering is an NP-hard problem.

And that is an NP-hardness that is different from the intrinsic difficulty

of inferencing graphical models. However, we've also shown that the graph

based view variable of elimination gives us a very, a set of very and intuitive,

heuristics, that allow us to construct good elimination orderings.

And those work by looking at the induced graph as it's constructed in the course

of Variable Elimination. And trying to keep it snall and sparse.

These heuristics although simple and certainly potentially not providing not

the performance in the worst case are often quite reasonable in practice and

are very commonly used.