yeah, but os that's the noisy or CPD.

And you can generalize this to a much broader notion of independence of causal

influence. This is called independence of causal

influence because it assumes that you have a bunch of causes for a variable and

each of them acts independently to affect the truth of that variable.

And so, there's no interactions between the different causes.

They each have their own separate mechanism and ultimately it's all

aggregated together in in a single in a single variable, Z from which the truth

of Y is then is then determined from this aggregate effect of all of the.

All of the effects, ZI's of the different causes.

So, one example of this is, we, we've already seen the noisy orbit.

You can. Easily generalizes to a broad range of

other cases. There's noisy ands where the aggregation

function is an and. There's noisy maxes which apply in the

nonbinary case when causes might not just be turned or off but rather they have

different sort of extents of being turn on and then z is actually sort of the

maximal extent of of, of the, if, the independent effect of

each cause, and so on. So there's a lot, a large range of

different models all of which fit into this family, meesie order is probably the

one that's most commonly used but the other ones have also been used, in other

settings as well. One model that might not immediately be

seen to fit into this framework but actually does, is a model that

corresponds to the sigmoid CPD. So what's a sigmoid CPD?

A sigmoid CPD says that each XI induces a continuous variable which represents WI,

XI. So imagine if each XI is discrete, then

ZI is just a continuous value, WI, which parameterizes this edge, and it tells us,

sort of, how much force, XI is going to exert on making Y true.

So if WI is zero it tells us that XI exerts no influence whatsoever.

If WI is positive, XI is going to make Y more likely to be true and if WI is

negative it's going to make Y less likely to be true.

All of these influences are aggregated together in this expression for the

variables Z which effectively adds up all of these different influences plus an

additional bias term. W0.

And now we need to turn this ultimately into, the probability of the variable Y,

which is the variable that we care about. And in order to do that, what we're going

to do is we're going to pass this continuous quantity Z, which is a real

number between negative infinity and infinity, through a Sigmoid Function.

The Sigmoid Function is defined as follows, and it's a function that some of

you have seen before in the context of machine learning, for example.

So Sigmoid takes the value, the continuous value Z, exponentiates it, and

then divides by one plus that exponent of Z.

And. Since E of Z, since E to the power of Z

is a positive number, this gives us a number that is always in the interval of

0,1. And if we look at what this function

looks like. It looks like this.

So, this is the sigmoid function. The X axis here is the value Z.

And the Y axis is the sigmoid function. And you can see that as Z gets very

negative, the probability goes to zero. As Z gets, close, very high, the

probability gets close to one, and then there's this interval in the middle where

intermediate values are taken. You can.

So this is kind of like a squelching function that that sort of squashes the

function on both ends. Let's look at the behavior of the sigmoid

CPD as a function of different parameters.

So here is a case where all of the X Is have the same parameter W.

And so what we see here is the value of this parameter W, and over here is the

number of XI's that are true. So let's look at, first this access over

here, the more parents that are true, the more parents that are on, the higher.

The probability of Y to be true, okay. And this, it holds for any value of W

because these are all positive influences.'Kay.

So the more parents are true, the more things are pushing Y to take the value

true. This axis over here is the axis of the

weight and we can see that for low weights, you need an awful lot of X's to

get Y to be true but as W increases, Y becomes true with a lot of fewer positive

influences. This graph on the right now.

Is what we get when we basically just increase the amplitude of the whole

system. We multiply both W and W0 by a factor of

ten. And what happens is that, that means that

the exponent gets pushed up to extreme values much quicker.

Z gets dissect, effectively multiplied by a factor of ten.

And that means that the transition becomes considerably sharper.

That gives us a little bit of an intuition on how the sigmoid function and

how the sigmoid cpd behaves. So what are some examples of this kind of

a of an application of this. So I showed this network in an earlier

part of this course it's the CBCS network and it's used for the it was developed

for the here at Stanford Medical School for diagnosis of internal diseases.

And so, up here we have things that represent predisposing factors.