So far we focused solely on the global structure of the distribution.

The fact that you can take it and factorize it as a product of factors that

correspond to subsets of variables. but it turns out that you also have other

types of structure that you might want to encode and that is actually really

important for real world applications. So to motivate that, let's look at the

tabular representation of conditional probability distributions which is what

we've used universally until now in the examples that we've given.

So a tabular representation is one of these examples where we have, you know, a

row for each assignment. So this is, just a reminder, this is G

with parents I and v. And here we have a row for each

assignment of appearance that give us explicitly enumerating all of the entry's

that correspond to the probabilities of the variable G.

So this is great because it will what's the problem and then it sounds like a

perfectly reasonable and very understandable representation.

Well, that's great, but now let's consider a more realistic example.

For instance, in a medical application we might have a variable corresponding to

cough. Well, there is lots and lots of things

that might make you cough. You might have pneumonia or the flu or

tuberculosis, or bronchitis, or just the common cold.

And, by the time you finish enumerating all of the many things that might make

you cough, you realize that the variable, usually, might have as many as ten, or

fifteen, or even twenty parents for a variable such as fever.

So that's so when you think about what the

implications of that are, relative to a tabular CPD.

You realize that if we have k parents. And let's assume for the moment that

they're all binary just for simplicity then the number of entries in the CPD

grows as two to the k, or of two to the k, depending on the number of variable,

value of the child. So, unfortunately these situations are

more common than not and that means that tabular representations are really not

suitable for a lot of real world applications.

So we have to think beyond that. So, fortunately, there is nothing in the

definition of a Bayesian network that imposes on us a fully specified table as

the only representation of a conditional probability distribution.

The only thing we need is that the CPV, p of x, given y1 up to yk, needs to specify

a distribution. Over x for each assignment, y, one to y,

k. And it can do that completely implicitly.

It can do that as a little bit of c code that looks at y, one to y, k and, and

prints out a distribution over x. Now fortunately we don't usually have to

sorry or in fact we can use any function parameterized or as a C routine or

anything to specify a factor over the scope X Y1 up to Yk such that well it has

to be a probability distribution over X, so it has to sum, so when you sum up all

the values of X. for given assignment Y1 up to YP has the

sum to one. Anything that satisfies these criteria,

these constraints, is a legitimate representation of a CP.

Like I said. Fortunately, we don't usually have to

resort to C code, to specify CPDs. the theory, the framework of statistics

has defined for us a multitude of different representations of a

conditional probability distribution given a set of, of conditioning

variables. so some examples include deterministic

CPDs where x is a is a deterministic function of y 1 up to y k.

We've already seen a couple of examples of that.

you can think, we can define, and we will define CPDs that have the form of what's

called a decision tree or a regression tree

framework that some of you have seen before.

you can think of CPDs that are logistic functions or more generally, log linear

models. We're going to talk about things that are

noisy or noisy. And which are noisy versions of

deterministic CPDs. And then, in the continuous case, we also

have, frameworks that allow us to represent the probability distribution of

a continuous variable on a set of continuous or discrete parents.

And that is really critical because, obviously, when you have a variable that

takes on a continuum of values, you. And possibly write down the table that

lists every single one of them. Now one of the things that are

intertwined with the notion of structure within a CPD is a useful notion

called context specific independence and it turns out that this notion arises in

some of thee, representations, of CPD's that we're going to talk about.

Context specific independence is a type of independence that is co, were we have

a set of variables X, a set of variables Y, a set of variables Z, and then we have

a particular assignment C which is, an assignment to some set of variables, big

C. So this is conditioning.

This is a, an independent statement that only holds for particular values of the

conditioning variable C. As opposed to all values of the, of the

conditioning variable C. Now, the definition of this is exactly as

we have seen before. So, except that if you remember before,

we had Z stays when we were doing conditional independence given Z, we had

the Z stays on the right hand side of the conditioning bar, everywhere.

Well now, Z and C stay on the right hand side of the conditioning bar in all of

these forms of the independent state. So let's look at y context specific

independence might arise when we have particular internal structure within a

CPD. So, let's imagine.

Let's, let's consider the case that x is a deterministic or of y1 and y2.

And the question is. What form of context specific

independence holds when we have X being a deterministic, X being this deterministic

or? So let's look at these statements one at

a time and here we have that Y2 is false. And when Y, what happens when Y2 if

false? Well, when Y2 is false, X is the same as

Y1. In which case obviously their not going

to be independent. On the other hand, what if Y2 is true.

Y was true. I don't care about Y1 anymore, because if

Y2 is true, X is true. So, here we have a notion of context

specific independence. What about this one?

I'm now asking about y1 being independent of y2 giving two possible values of x.

What happens when x is false? What do I know about y1 and y2.

They're both false. Well if they're both false, then they're

independent of each other. Because if you tell me that one of them

is false, I already know the other one's false, so they're independent.

And so this is another, condition, context specific independence that holds

here, does it hold if I tell you that x is

true? No, because I don't know which of y1 and

y2 made x true. And so this last one doesn't hold and so

here are two context specific independencies that hold in the context

of deterministic cpd that wouldn't hold. Neither of these would hold for general

purpose cpd, would necessarily hold for general purpose cpd, where, x depending

on y one and y two.