0:20

And the actual marginalization. And so what we're going to do now, is

we're going to count the operations used by each of them.

So, let's start with a factor product and first let's remind ourselves of what a

factor product is doing. So, factor product is taking in this case

for example a factor who's scope is AB and one whose scope is BC and producing a

factor whose scope is ABC. Now let's think about how each of the

numbers in this result is produced. So each of these is a product, in this

case of two numbers, one that comes from this table and one that comes from that

table. So we have to produce every one of the

rows in this in this new table, this new factor, so let's call NK the number of

the rows in this new table. And how many operations are there that we

need to produce each such row? Well, if we need to multiply in, in this

case MK different factors. So for each row.

2:28

Each one gets added to only one of the rows in the new factor.

And so a simple upper bound on the amount of of additions that we need to do is

simply the size of this factor nk. So now let's total up the computational

complexity of variable elimination. So let's assume that we start with m

factors. For Bayesian networks m is really

effectively n because we have one factor. For every variable.

Which is the CPD. And the reason I wrote less than or equal

is because of the reduction by evidence. Now for Markov networks this connection

would be larger. So if you think of a grid Markov network

or a fully connected Markov network, the number of factors might be so fully

connected pairwise Markov networks. The number of factors can actually be

larger than the number of variables. So that's why we have the complexity in

terms of m as opposed to in terms of n. Now, so that's the set of factors that we

start out with and then what happens as we an elimination step, and elimination

step picks some of tose factors and generates another factor, but each

elimination step generates exactly one factor.

3:50

How many elimination steps do we have? Well, each elimination step corresponds

to elimination of one variable, so we have at most, N elimination steps.

So the total number of factors that we ever produce is, which we're going to

call M star. Is going to be equal to, at the most, M,

which is the set of initial factors. Plus the newly generated factors which is

at most N. And so all together M star is less than

or equal to M plus N. So, now that we've figured that out,

let's look at what the complexity of the algorithm is in terms of various key

quantities. So n is the size of the largest factor

that I ever create. Which is the maths of these different

nk's that I had. So, now, how many product operations do

we have? Well, remember that we had a sum over

different elimination steps, so, sum over K, and this was the number of product

operations that we have. But now, here is the critical

observation. Each factor.

Is multiplied in at most once. Because as soon as we multiply it in.

5:10

As soon as we multiply it in, it goes away.

Which means that the sum over K, MK minus one, is is at most the total number of

factors. And so so said otherwise, let's, we can

write that this is less than or equal to M times the sum over K, MK minus one, and

this is less than or equal, to M star, because this is at most the total number

of factors in my universal factors. What about the number of summation

operations? Well, here, this is less than or equal to

the sum over KNK, which, when you, which is N times, less than or equal to N times

the number of elimination steps. Which is simply less than or equal to n

times n. Though altogether between these two

steps, over here we have n x m*. Over here we have n x m, which tells us

that the total work that we have is linear in n and in m*.

Great. Linear time, aren't we lucky?

Well, not quite. Because, the nk, which is, the,

contribution to this quantity n, is the total number of values in a factor.

And so if we let, if we say for example just for simplicity, all variables have d

values. In their scope.

7:41

So let's understand how this complexity manifests in the context of a real

example. So this is the run of variable

elimination that we did, that we did before just written out all in one in one

slide. And so now let's see what the complexity

of this is. When we see that we have produced several

factors here and let's write down how many variables are in the scope of each

of these factors, this one has two. This one has three, G, I, and V.

This one has, three, S, G, and I. This one has three, H, G, and J.

This one has four, L, G, S, and J. And this one has three, J, L, and S.

And so the size of the largest factor is, is this one that has four variables in

it. And that is what's going to, in general,

drive the complexity of the algorithm. not an example as simple as this, but in

more realistic examples. So, now let's understand how elimination

ordering plays into this. We previously said that variables can be

eliminated in any order, so long as we're careful about multiplying things in at

the appropriate time. But now let's see how elimination order

might affect the complexity. So assume that in this example I'm going

to make a not very judicious decision and I'm going to start by eliminating G.

So which factors do I need to multiply in, in order to eliminate G?

Well, psi L of L and G. Psi G of GI and D, and, phi H of H, G,

and J. And so if I multiply all these together,

it turn out that I now end up with a variant with a factor whose scope is,

let's see, L. D, I, D, H, and J, so a total of six

variables, whereas before the largest factor that we ever generated had four

variables. So that's, maybe you might say, six

versus four, not a big deal, I mean it's, you know, how much does, how much

difference does it make? So let's convince ourselves that in other

graphs it might make a bigger difference. So here is a graph that has, it's a

simple, pairwise Markov network with A and C and then a bunch of variables in

the middle, B1 up to BK. And then imagine that I start by

eliminating A first. Well, what are the factors that involve

A? Well, we have a factor, ab1, ab2, ab3, up

to abk. And the total scope of the factors are,

Is, of this factor that we generate is going to a, b1, up to bk.

So it's going to be exponential in k. So the size of the factor Is exponential.

Nk. Maybe this is inevitable.

Well, no. So let's imagine that, instead, we're

going to eliminate the bi's first. So, let's think, for example, that we're

going to start by eliminating b1. Well, b1 is in a factor with a, and in a

factor with c. So we're going to end up with a product

of, say, phi one, phi a1 of a, b1 * phi c1 of c, b1.

And that's going to give me a factor whose scope.

Is a. B1 and C.

And the result of summing out B1 is going to be a factor, tau one of A and C.

We're going to get the exact same behavior when we now eliminate B2.

And that's going to give me a factor, tau two of A and C.

And so on and so forth. Until, at the very end, I'm going to have

a bunch of factors. Tau I of A and C that are all multiplied

together. And I've done this without ever

generating a factor whose size is bigger than three.