So far we've talked a lot about the representation of probabilistic graphical

models, we define different classes of probabilistic graphical models, Bayes

net, Markov net and talk about their independence assumptions.

Now let's operationalize probabilistic graphical models and figure out how to

use this the clarative representation to answer actual queries.

It turns out there is many queries that one can use a PGM to answer but perhaps

one of the most commonly used one is what's called a Conditional Probability

Query. So let's define a Conditional Probability

Query. In a Conditional Probability Query we

have some set of observations. E, little E, about a set of variables,

big E. These are the variables that we happened

to observe. We also have a particular query that we

care about, which is, usually which we are going to denote as a set of

variables, Y. So our goal here is to compute the.

conditional probability of the variables Y given the evidence equals little E or a

conditional probability. This type of query is useful for a

variety of different applications. For example, in the medical or fall diagnosis

domains that we've spoken about we might have observations about certain symptoms,

or test results and we're interested in predicting the probability of different

failure modes or different diseases. In the pedigree analysis example that we

also talked about we might have observations about the phenotype or maybe

even the genotype of some individuals. Family and are interested in reaching

conclusions about other individuals. So, unfortunately, like most interesting

problems the problem of inference and probabilistic graphical models is NP

hard. So, before we talk about that, let's

first remind ourselves what NP hardness is.

I'm not going to define it formally in this course.

But a, as a rough intuition, if a problem has been shown to be NP hard, it means

that it, it extremely unlikely to have an efficient solution.

Slightly more formally, it means that all algorithms that people have devised for

this problem, and a bunch of problems that are equally hard, require at the

very least time exponential in the size of representation of the problem, which

means that it's unlikely to be solvable for any problem that is bigger than a, a

small handful of say variables in our context.

So which particular problems in the context of PGM inference or MP hard?

Well, first, the basic problem of exact inference in the PGM is MP hard.

So given a PGM, P of Pi defined by a set of factors phi.

And a particular target variable x. And a value, little X.

Computing the probability of the event, Xlittle equals X, is MP hard.

In fact even a simpler special case of this problem.

One where we just want to figure out whether is probability is even positive.

That too is [INAUDIBLE]. So one might then say well okay, exact

inference is is intractable. But what about if we compromise a little

bit on accuracy. What if we're just looking for an

approximate answer. After all, we don't necessarily care

about the tenth significant digit in this probability.

Does that make the problem easier? Unfortunately, turns out that the answer

is no. And there is many different variants

empty hardness results for approximate inference.

This is just one of them. This one says that, again, given a PGM,

an event X equals little X and some set of observations.

The problem of finding. A, an approximate answer P.

For which we are guaranteed that this approximate answer is within epsilon of

the truth, is empty heart. And this is true for epis, for any

epsilon that's less than 0.5. And note that epsilon equal to 0.5 can be

obtained by simply random guessing. Or guessing the value equal to 0.5.

So any non-trivial approximation for this kind of Conditional Probability Query is

also intractable. It's also a rather depressing

observation, and might want, might want to make us give up on using PGMs for

inference. But it turns out that one shouldn't get

depressed too quickly, because importantly, this is a worst case result.

Which means we can construct these bizarre PGMs for which exponential time

is the best we can do. But that doesn't mean that in the general

case one can't do better. And the rest of the inference section of

this course will be devoted to showing a variety of algorithms that whether for

exact inference for approximate inference, can do considerably better

than this worst case result can suggest. So, first let's to understand where these

results might be getting their, where these algorithms might be getting their

power, let's drill, drill down into the inference question a little bit more.

And here is a slightly elaborated version of our student network where we now have

additional variables. For example, the coherence of a course

influences its difficulty and a variety of other variables that we're not going

to talk about. So.

Difference in, this model and in general inference for probabilistic graphical

model uses, the notion of factors that we've talked about before and it turns

out that it's convenient to use factors because it means that the algorithms

apply equally well. To bas net and mark up networks, and, and

it's a very useful extraction. So let's think about this, basion network

in the context of a set of factors, so here for example we, had, initially, a

prior probability over the C variable P of C, that's, translates into a factor,

whose scope is C. We have, for example, here Q of G given I

and D, and that translates into. This factor whose scope is g, I, and d.

And in general each one of the CPDs in this network converts to a factor over

the scope of the family that is the variable and its parents.

So now let's assume that our goal is to compute the probability of the variable

J, and so we'd, we'd like to compute p of j.

And what we see here is the joint distribution.

So this is the joint distribution which we have defined using the chain rule for

Bayesian networks. In order to compute P of J what we need

to do is only to eliminate or marginalize out all of the variables except for J and

so we end up with a summation that looks like this which is why this problem of

inference of con of conditional probability inferences often called sum

product because we have a sum over product to factors.