As we said, there's many different kinds of queries that one can use a graphical

model to answer. Conditional probability queries are very

commonly used, but another commonly used type of inference is called MAP

Inference. So, what is MAP inference?

MAP stands as a shorthand for what's, for Maximum a Posteriori, and it's defined as

follows. Here, we have a set of evidence of

observations, e over some variables to E, and we have a query Y.

And it turns out that for computational reasons, that we're not going to discuss

for the moment it's important that the set of Y's is everything, all of the

variables other than E. [SOUND] So now, our task is to compute

what's called the MAP assignment, which is the Y, the silent little y, to the

variables y that maximizes the conditional probability of the variables

y given the evidence. Now so for those of you who haven't seen

the notation Argo Max. Argo max is the y that provides the

maximal value to this expression, over here.

Now, note that in some cases, this maximizing value might not be unique.

That is there might be several different assignments say, A1, Y1, and Y2 that give

the exact same probability. And so, the MAP is not necessarily a

unique assignment. This has many applications, some of which

we've discussed before. So, in the context of message decoding,

for example, where we have a set of noisy bits that are passed over the channel of

a noisy communication channel, what we often want to get is the most

likely message that was transmitted. That is, an assignment to the transmitted

bits that is the most likely given or evidence.

In the context of image segmentation, we would like to take the pixels and figure

out the most likely assignment of pixels to se, to semantic categories.

So both of these can be viewed as MAP assignment problems.

And an important thing to understand about MAP is that it really is a

different problem than conditional probability queries.

So to understand that, let's look at the following very simple

example over just two random variables. So here, we have a Bayesian network over

the variables A and B. And if you multiply the to CPDs, it's you

get the joint distribution shown over here.

And it's not and it's fairly immediate to see that the MAP assignment is this one,

because it has the highest probability. Can we get at the map assignment by

looking separately at the variable A and at the variable B?

So if we look at that, we see that if we look separately at the variable A, we

can, the most likely assignment to the variable A is A1.

As opposed to in the MAP assignment, the variable A took the value, A0.

And so, one can't look separately at the marginal over A and over B, and use that

to infer the MAP assignment. And the reason is that we're looking for

a single assignment over all of the variables that together has has the

highest probability. Unfortunately, just like the conditional

probability inference however, this problem too is empty heart.

So, again let's formalize what exact problem is empty heart in this context.

And it and so here is, again an, one example of an empty heart problem in the

context of MAP which is just define the joint assignment with the highest

probability. It's not the only NP-hard problem.

Here is another one. Figuring out what, for a given

probabilistic graphical model and some threshold little p,

whether there exists an assignment, little x, whose probability is greater

than p. That problem too NP-hard.

So should we give up. Well, just like in the context of

conditional probability queries, the answer is no. And there is algorithms

that can solve this problem very efficiently. In fact, in a even broader

set of problems then, for conditional probability queries.

So, let's again look deeper into this problem and understand the foundations of

what might make it tractable. So, going back to our example of a Bayesian

network, once again we're going to view CPDs as

factors. So here a P of C again translates into a

factor over C just like before. And whereas, in the case of conditional

probability queries, we want it to sum out some of the variables marginalize

them. Now, we're going to what to find the arg

max which is the assignment of these variables which maximizes the product.

And so, here we have a max of a product, which is why this is often called a max

product problem. Let's break down the max product problem.

So, imagine that we, in more general have, in the more general case have

probability over Y given E equals little e and re, let's remind ourselves that Y

is is the set of all variables other than the ones in E.

And so, by the definition of conditional probability, we have this ratio whose

where we're trying to find the maximal Y that maximizes this ratio.

And notice that the denominator is constant relative to Y.

[SOUND] Sorry yes, with respect to Y. Which means that for the purpose of

finding the maximum Y, we don't really care about the denominator only the

numerator which is the probability of Y, Ee = e, the unnormalized a normalized

measure. The prob, this numerator in the general

case is a product of factors normalized by the partition function, where the

factors here are the reduced factors, reduced relative to the evidence.