0:00

We've shown that the clique tree algorithm has some pretty cool

properties, that it guarentees that we achieve the correct marginal, at every

single clique and therefore that these marginals necassarily agree with each

other. We've also seen that these marginals be,

can be computed, using a single upward and downward pass over the cligue trees,

so a fairly efficient computation. But, we also note that the problem of

inference in probablistic graphical models, is an empty heart problem and

somewhere over there, there might, must be a catch, that is there must be, a

computational cost, that occurs. Least in certain cases when we round the

clique tree algorithm. And what we're going to do now is we're

going to try and understand where that hidden cost might lie.

So Well where going to do so in order to

analyze the computation complexity of a clique tree we're going to define a

little bit of notation. so let's consider an edge IJ in the

cligue tree T and let's divide a little bit of notation we're going to divide up

the variables into three groups there's this group so here what we have is a

clique tree here's CI and CJ. There adjacent kleeks and they're

connected to each other via steps at S, I, J and then there's this whole kleek

tree, a whole bunch of kleeks on the left side and a whole bunch of kleeks on the

right side. We're going to divide the variables into

these three groups: W, less, W that's on the I side of the I, J edge are the

variables that are just. In this, group here not counting the sub

set, which is going to appear in both sets.

W that's on the J side, is over there, and SIJ, is the stuff that's in the

middle, so these are three, mutually exclusive, and exhaustive groups, that

partiiton all the variables in, the tree. And now, here is a,

An interesting and, and quite enlightening theorem.

That says that, the clique tree t satisfies the running intersection

property. If and only if, for every subset IJ, we

have that, the variables on the left side of the edge are independent of the

variables on the right side of the edge. Given the variables in the middle of the

edge. That is, these variables, the subset,

separate. The left side from the right side.

2:39

Now, remember that the running intersection property was the critical

property that we used to prove correctness of the clique tree algorithm.

And so this is a, coming up with this condition that tells us exactly when

running intersection holds, is important because this is the defining property for

all of those nice behaviors that we talked about earlier in terms of the

clique tree algorithm. So let's try and convince ourselves of

this by looking at a concrete example first.

Let's look at the cleet tree that we have over here and let's for example focus on

this subset which has the variables G and S and the variables on the left side of

that subset excluding the variables G and S are I.

D and C. On the other hand, the variables on the

right side of this subset again excluding G and S are H.

J and L, so now let's look at where these variables placed themselves on the graph

structure over here which is the induced graph corresponding to the Markov

network, which we derive from the factors in this Basian network, so Basian

network, CPE's produce factors, the factors give us this induced Markov

network. So where are the sets of variables G and

S? Those are over here.

Where are the variables on the left hand side?

These are the blue variables, C, I, D, and they're sitting over there.

And the green variables, H, J, and L, are sitting over here.

And, simple inspection can show us that there are no paths or trails between the

blue variables and the green variables, that did not go through the red

variables. So that, we conclude from that, that the

variables C. I knee are separated.

4:34

From. H, O, and J given G and S.

And from the, connection between the graph structure and independence in

Markov networks, we can conclude from that, that this independence property.

Holds, that is that C, I, and D are independent of H, J, and l given G and S.

Which is exactly what we wanted to show, that the subset separates the variables

on the left, the blue variables from the variables on the right, the green

variables. Let's try an give a slightly more a

general argument for this. One That isn't just demonstrating it in

the context of a particular example. so let's ignore for a moment the concrete

letters inside this. Example and just think about what would

why this is going to be, true. So let's imagine that this is now a

generic. Subset.

And, this is it over here. And, we'd like to prove that all the

variables, on the green side are independent of all the variables on the

blue side. Now let's assume otherwise this is going

to be a prove by contradiction. If this were not the case then that means

that in this induced Markov network there needs to be some.

So there needs to be, to otherwise.

6:26

Which means there needs to be a path in the induced Markov network between the

blue side and the green side. So between W less than IJ.

And, W, less than JI. Well if there exists a path that goes

from one side of this graph to the other, it means there eventually has to be an

edge where one node is on one side and one node's on the other.

So there, that means there needs to be some edge that goes from the blue side to

the green side. Now notice, and I forgot to say, this

path that exists in the induced Markov network, doesn't.

7:27

That doesn't, go through. The subset, because otherwise the subset

would separate it. So, it doesn't go through SIJ.

So there needs to be an edge that goes for example from here to there or it

doesn't matter which node I pick, which pair of nodes I pick.

There needs to be some node on the green side and some node on the blue side.

This is the green side and this is the blue side and an edge that goes between

them that doesn't go through the subset. But now that implies that there needs to

be some factor. That involves those two variables.

So in this case, e comma h. But now because of family preservation

that factor must sit in some. One of the clique's in this clique tree.

And that clique is either on the green side or the blue side.

It has to be somewhere. So, lets assume that we put that clique

without loss of generality on. The green site that's not what happens we

have an H that's sitting here and remember H is a blue variable and H is

also sitting here because it is on the blue side and now we have an H in one

clique and an H in the other and running intersection property tells us that H

needs to be everywhere in between and specifically.

It needs to be in the subset, which is a violation of the assumption, either of

the running interception property or of the assumption that the, the variable h

is not in this in the subset. And so that's, a sort of, a somewhat

formal proof outline that can, with a little bit of extra effort and notation,

be made into a rigorous proof of why a running intersection property implies

this independence assumption. And I didn't prove the other direction,

because this is actually the direction that we care about more.

So we start out this whole discussion by saying that these properties have

computational implications. But where are these computational

implications? What can we conclude from the fact that

the subset needs to separate the graphics to conditionally independent pieces?

Well it turns out that in many graphs that implies a certain minimal complexity

that can sometimes be quite large. So let's do this in, let's look at this

in the context of two quite simple but very commonly used examples.

So here is an example of A. What's called a, a complete bipartite

gra, graph. Where we have two sets of variables.

That have no edges between each of the set separately.

But where all of the cross edges are present.

This is a structure that is that we've actually seen before.

We saw it in the context of the, play model for a student for course

difficulty. And student intelligence where we had a

bunch of difficulty variables, a bunch of intelligence variables and there were no

edges between the difficulties or, I mean between the intelligences.

But for every difficulty intelligence pair there was an edge that was induced

by the V structure corresponding to an observed student grade between the course

difficulty, corresponding course difficulty and not student intelligence.

Now what is the smallest subset that we could possibly construct for this graph?

Can we, for example just look at, say these two As and separate out the graph

into two conditionally independent pieces?

Well, no, not really because for example if we now look at these two Bs we see

that you can connect them via any one of the other As that I didn't include in the

subset, and so there is no, this doesn't de-couple the graph at all.

With a little bit of extra thought it's not difficult to convince oneself that

the smaller sets that we could construct, that actually breaks up the graph into

meaningful pieces is either all the variable son the one side or all the

variables on the other. Which means the smaller sub set.

12:00

For the small. in any kind of meaningful clique tree has

size. Greater than or equal to min of k comma m

with k's number of variables on one side and m on the other.

A slightly less obvious example, but one that is also imposes some very

significant lower bounds is in the context of a grid.

Such as we encountered in the context of the icing model or you know, when doing

image analysis, where the variables correspond to pixels.

And now let's try and think about how we might break up this graph into

conditionally independent pieces. Now, we can construct clique trees that

have smaller subsets, small subsets. For example, here's a subset.

Breaks away A11 from everything else. But notice that, that still leaves me a

very large everything else. but if we try for example, to break up

the graphs, so that a11 appears on the one side, and A44 appears on the other.

Any, clique tree that, you give me, that will have this separation, with A11 on

the one side, and A44 on the other. Any such clique tree has to have.

13:23

A subset. Of size.

Greater than or equal to N, where this is a N by N gre, grid.

Which means that if you try and break up the N by N grid.

In a way that puts, one corner on one side, and one corner on the other.

Then, you're going to have a subset, that's at least a dimension of the grid.

And breaking it up in other ways doesn't make it any better.

So, here are two cases where we can place a lower bound on the size of the subset

that is required for running cliue interference and that is where we pay the

exponential blowup that is in some sense required by the fact that that the

problem is intrinsically an [INAUDIBLE]. So to summarize we've shown previously

that the correctness of the cleet tree inference algorithm relies on having the

running intersection property and we've now shown in tern that the running

intersection property implies certain separation properties on the original

distribution. These separation properties in turn, can

be used to analyze the complexity of inference in different graphs.

And provide certain minimal bounds on the complexity that would hav to be incurred

by the best possible clique tree on graphs.

And we've already seen before, a notion of minimal complexity, which is the

minimal induced width of the graph. This notion is a little bit different,

because it talks about subsets. Whereas the induced width really talks

more about cliques, but these are both different ways of analyzing the

complexity certain minimal complexity that has to be incurred by exact

inference, which again is related to the NP-hardness of the problem.