We previously used the variable elimination algorithm as a way of proving the correctness of the clique tree algorithm. It turns out that this relationship can also be used to answer another fundamental question regarding clique trees, which is the question of where clique trees come from or rather how we might construct a good clique tree. In order to do that, what we're going to do is we're going to is we're going to look at a run of variable elimination and we're going to see how we might reinterpret it as a clique tree as a clique tree variable. So, in order to do that let's look at the variable elimination algorithm and try and take a clique tree view of what it's doing. So the variable elimination algorithm, as a reminder, starts by takes a series of steps, where in each step it creates a large factor, which for the purposes of this discussion we're going to call lambda i. And what it does and it does that by taking a bunch of factors and multiplying them together. When that is done we eliminate the variable from lambda i and that gives us a new, smaller scope factor which we're going to call tau i And tau i is put into the pool of factors that we have available to us which means that at some point we're going to use it in the context of computing another large factors lambda j when we multiply a bunch of factors together. So now let's think about this process in terms of passing messages in in between computational entities in the same way that we had when we were doing message passing algorithms. So we're going to look at these intermediate factors lambda i's as the cliques in a clique tree. And the tau i's that are used, that are produced in the context of one clique. And utilized by a different clique. we're going to consider that to be a message. So these little factors that are produced by lander and consumed by another lander can be viewed as a message that's passed from the clique that corresponds to the first lambda and received by the second lambda So summarising the variable elimination algorithm defines a graph. We're going to have a cluster Ci for every factor lamda, okay, we're going to draw an edge, between Ci and Cj if the factor that was generated in the computation on which I multiplied together all the factors, to produce lamda i is used in the computation of lamda j. And so we can think of it as Ci which corresponds to lambda i, passing a message to Cj that corresponds to lambda j. And that message is the tau i that was produced by, by the lambda i computation. So now let's look at this in the context of an example. We're going to go through the variable elimination process that we had for our network that we had earlier, the enhanced student network. And so the first variable elimination step multiplies together the C factor and the CD factor and that gives us, this is my lambda 1. And lambda 1 produces a clique whose scope is CD and so that's this clique over here. The next step eliminates D, and the way in which it does that is it takes tau one and then it multiplies as, it multiplies that by one of my original factors phi G and so now we have a lambda 2 this is my lambda 2. And that gives me a cleet who's scope is GID. And since GID used tau 1, Then we have an edge from CD to GID whose scope is the variable D. The next step in the analysis that we did before, eliminated the variable i and again used two of the original factors phi i and phi s. And it also used how to. And so now, we have a factor lambda 3 whose scope is G S and I because those are the variables in this scope over here. And the message GI is that we see here, this edge between GID and GSI corresponds to this tau 2 that was produced by lambda 2 and consumed by lambda 3. The next elimination step didn't use any of the prid factors produced by other, by other cliques, just used one of the original cliques, so it just used phi h. And so note that it's currently not connected to anything, it's just sitting there all by its lonesome. And the next step now gets to use, gets to use factors that were produced in two separate computations. So notice that this computation for tau, for tau, for lambda 5 which is this guy, used both, tau 3 and tau 4 which is why we have both an edge from cluster 3 and an edge from cluster 4. The next step that we took marginalized out S, it's multiplied in tau 5 and one of the original factors JLS. And, so now we have a message, here that is passed between cluster 5 and cluster 6. And finally there was the marginalization of L and so we have a JL cluster. Now if you look at this graph, at this clique tree, you'll notice that it's bigger than the clique tree that we had for this network to begin with. And one of the, and the reason for that, is that we actually have three adjacent cliques where one is a subset of the others. So JL is a subset of JSL which in turn is a subset of GJSL. Now if you think about it there's really no point to including three cliques when one is a subset of the other because you might as well do all of the computation in the first clique to begin with. There is no value to sort of breaking it out into separate data structures. And so a typical post-processing step is to go ahead and remove redundant cliques and just basically collapse them, and their factors into into the one, the biggest clique subsumes them. So in this case since we had 5JLS which originally utilized in tau 6, we're now going to move that and put that instead in, sorry it was used in lambda 6 we're now going to stick that in clique 5. And so we've seen how we can take a variable eliminator process a run of variable elimination and use that to produce what looks like a cleek tree. But is it a clique tree? SO what properties can we prove about this output from a variable elimination model? Well first is the fact that it is in fact a tree. If we want it to be a clique tree it better be a tree. Why is it a tree? Because in variable elimination, every intermediate factor tau, once you produce it, it gets consumed exactly once. Once it gets consumed, remember in the variable elimination algorithm, it gets taken out of the set of factors and then, it disappears because it gets multiplied in. And so if you think about this as a directed tree, it starts out and it produces factors and each factor feeds up only into a single factor parent and, so as a directed tree it means that every var-, every cluster has exact, has at most one parent. And in fact every cluster except the last one has exactly one parent because eventually everything has to be consumed because everything is multiplied together. And so this variable elimination process induces a directed tree where we subsequently forget about the directionality of the edges. The second property that is again not difficult to show is that the tree is family preserving. And that is because each of the original factors was in my original factor set. Each of my original factors phi in my original factor set big phi was somewhere in the variable elimination process. And that means it must have been used at some point because eventually I'm multiplying all factors together. And therefore it must be contained in the scope of one of these. Sorry, this should be lambda i. And that lambda i by force has a scope that contains the scope of phi, which is what we need for the family preservation property. The final property we had that we would like to show hold, is that the tree respects or obeys the running intersection property. And so as a reminder, that property says that if you have two clusters Ci and Cj with a variable X that is present in both, then X needs to be in each cluster and each subset in the unique path between Ci and Cj. So first of all let's convince are selfs that, that running interception property holds here, and we've already seen that, because this is actually the familiar clique tree that we had for this example in fact we, produced that clique tree by running variable elimination using exactly the process that, that I showed you before. Now lets show that however, more generally. So we want to prove to ourselves, that if T is a tree of clusters that was induced by a run of variable elimination, then T obeys the running interception property. And so now remember that this tree came from a directed run and so what I'm showing you here is actually a directed tree where we see how the clusters communicate this is not the same tree that we had in our example it's a more complicated one to illustrate the sort of to make the theorem more interesting to show. So in one of those subsets and in exactly one of those subsets the variable x is eliminated because this is a random variable elimination every variable is eliminated exactly once. So let's imagine that x is eliminated here. So this click over here is the last to Cx, so C4 has x in it. X is in C4 and X is not in C6. Now let's take any other cluster that contains X. So for example assume that cluster 7 contains X. Well, if cluster 7 contains x, and x wasn't eliminated on this edge, because it's eliminated exactly once. It means that it needs to be on subset, which means it was in the message tau that c7 produced. So it's in tau 7. X is in the scope of tau 7. Which means that x also needs to be in the scope of C3 because it was multiplied in. If it was multiplied in then it has to be in that scope. And so by continuing this in exactly the same inductive argument, we know that since X has to be in C3 it also needs to be on this subset and so on. And so we've shown that X needs to be on the path from C7 to C4. And now let's take a different cluster that contains C5, it contains X for example C5. We can similarly show that X needs to be on every step along the path. And since this is a tree, means at least to be on the path between C7 and C5. And so that argument shows that for any pair of clusters or cliques that contain X, X has to be on every step of the path between them. So to summarize a run the variable eliminator implicitly defines a correct clique tree. Which means that, you can, if you want to get the clique tree, that has the required properties, family preservation and running intersection. We can simply simulate a run of variable elimination. What does it mean to simulate? It doesn't mean that we run variable elimination. Because that would be an expensive process. We just figure out what factors I would multiply and what scopes that would have. And therefore, and what factor gets used where. And that gives me the cliques and the connections between them. And this is the exact process that we did in the simple example that we had. And that clique tree has a computational cost which is essentially equivalent to the cost of running the variable elimination algorithm. Because, if you think about the, sizes of the clique that are produced, they correspond exactly to the sizes of, the factors produced by variable elimination. So these costs are directly comparable to each other. Except that as we showed before, the clique trees use dynamic programming or storing, or caching of the messages so that we don't need to recompute the same message more than once. And that, allows us to produce marginals over all of the variables in the network at only twice the cost of running variable elimination. So, taking yet one other step backward, what that means is if you, care to get the values of, the marginals for even not, not necessarily even all variable but multiple variables. One efficient way of doing that is to take variable elmination, do the simulation to figure out what clique tree we want to build or how to build a clique tree and then use that clique tree to compute all of the marginals by appropriate message passing. And we talked about variable elimination and how to construct to use heuristic solutions to construct a good variable elimination ordering one that gives us relatively small factor hopefully along the way and that exact same set of heuristics is also used for constructing a small efficient clique tree to the extent that that's possible given our graph.