The second challenge associated with these template-based

models are computational issues.

So sure we can produce an unrolled Bayesian network and compute a posterior

over any subset of the variables using standard inference techniques.

But one of the consequences of being able to produce these very large

probabilistic graphical models from a fairly small template is that we can

produce very large probabilistic graphical models from a very small template.

And large probabilistic graphical models can pose new inference challenges

in terms of the ability to scale inference to models of that size.

So, specifically,

if we look at the unrolled model that arises from an unrolled DBN and

thinking back to some of the analysis that we did regarding the complexity

of probabilistic inference for a particular probabilistic graphical model,

we remember, for example, that if we want to run inference,

the exact inference, say, a clique tree over this unrolled network.

Then for example, if we want this to have a clique tree where,

say, the time 0 variables are in one part of the model and

the time t variables for some future t are on some other clique and

all, so they're not all together in one big clique.

Then the minimal subset needs to separate these variables over here.

And we say the blue variables, the time 0 variables, from the green variables.

So the subset must separate them.

And if you think about what separation imposes on us in this setting,

we can see that the only way to separate the, for example,

these variables, the blue variables over here,

from the green variables over here is to put in the subset at least

the persistent variables, that is the ones where there is an edge

from the variable at time t to its incarnation at time t + 1.

And so the minimal subset in this context, the smallest subset that we can

construct that would allow us to have different cliques for, say, time 0 and

time 2, is something that involves all of these variables in the middle.

And so, that can potentially involve a significant computational cost for

exact inference, especially when we have a large number of persistent variables.

A different way of understanding this is via the concept of entanglement,

if we're thinking of belief state tracking.

So if our goal is to maintain a belief state over, say, the variables of time 3,

and we're trying to think about how can we maintain this probability distribution,

the sigma of time 3, in a way that doesn't involve

maintaining a full explicit joint distribution over the variables time 3.

We quickly realize that, really, we have no choice, because there are no

conditional independencies within, Time 3.

Now you might say, how is that?

This is a nicely structured model.

Why wouldn't there be conditional independencies?

Well, look at what's going on here.

Can we say that weather is, at time 3,

is conditionally independent of failure at time 3?

Well, locally, they're not connected to each other within the time slice,

but there's certainly an active trail between them that goes,

for example, like this, from weather time 3, weather time 2,

weather time 1, failure time 2, to failure time 3.

And so, this is an active trail between these two variables, which means that they

are, when you consider only the variables at time 3, not conditionally independent

of each other, given any of the other variables at time 3.

And so, this entanglement process occurs very rapidly over

the course of tracking a belief state in a dynamic Bayesian network,

which eventually means that the belief state, if you want to maintain

the exact belief state, is just fully correlated, in most cases.