0:00

We've show how probabilistic graphical models can be used for a variety of

inference tasks like computing conditional probabilities or finding the

map assignment. But often the thing that you actually

want to do with a probability distribution is make decisions in the

world. So, for example, if you're a doctor

encountering a patient, it's not just enough to figure out what disease the

patient has. Ultimately you need to decide what

treatment to give the patient. How do we use a probability distribution

and specifically a probabilistic graphical model in order to make the

decisions? It turns out that the theoretical

foundations for doing this kind of inference.

Pass were actually established long before probalistic graphical models came

to the fore work and that framework is called the framework of maximum expected

utility. So lets formulate the problem that we are

trying to solve and then we can define the principle of M E U or maximum

expected utility. So what are the simple decision making

situation be comprised of? Well we have a set now, it's no longer a

purely probabilistic model we have other elements in it.

We have an action A which has several different choices that we might pick, for

example different drugs that we might choose to give the patient.

We have a set of states of the world X1 of the XN.

And the action can affect which of those states comes about.

So this set of states might include those things that I can't affect like the

patient's pre-disposing factor of which disease they have or which test results

came out which way. But it can also include things like the

outcome that that the patient will have after I administer the drug.

So this, this set of variables that defines the state can involve things that

I can affect and things that I cannot affect.

And then finally, the final piece of the puzzle here is something that defines the

agent's preferences. That is they cannot make preferences that

weigh different, that consider different actions without weighing the relative

merits of different actions. And my way of evaluating the merits of

different actions is to have some kind of numerical utility that tells me if I take

action A in this, and I have this particular state of the world, how happy

am I? And.

Very happy, mediat, moderately happy, or extremely unhappy, so this numerical

utility function is going to be thing that's going to allow me to evaluate

different actions in terms of how much I prefer them.

So, now I can go ahead and write down the notion of an expected utility for a

decision, problem D, given an action A. And, the way in which that's done, is a

simple expectation. I sum up over all possible states of the

world. The probability that I get that state,

given the action, and remember, not everything in that action might,

everything in that state might be affected by the action, and then I

multiply that by the utility of the state action pair.

And I sum up over all possible states that I might end up with.

And, that summation, that expectation, is the overall happiness that I have on

average, in this decision making situation, if I take the actioning.

Clearly, I want to maximize my overall happiness.

And so the purpose of the maximum expected utility computation.

The principle of maximum expected utility,

maximum expected utility is that I want to choose the action A that maximizes

this expected utility. We can use that ideas that we developed

in the context of graphical models to represent decision making situations in a

very interpretable way. So this is a structure called an

influence diagram and it's an extension of a Bayesian network with two additional

pieces. We have in addition to random variables

which are denoted as usual as ovals. So here's a random variable.

And a random variable is part of my state X.

4:27

We also have two other kinds of nodes in this diagram.

We have, action variables. Which are places where it's not nature

that gets to make a choice over the value of the variable.

But rather the agent that gets to pick. And then we have the green diamonds at

the bottom which represents utility. So, let's look at this particular

example, which is probably one of, the simplest thing that you can construct.

this is a, budding entrepreneur, who just graduated from college and wants to

decide whether to found a widget making company or not.

So he has two possible actions, F0 which is do not found and go take a more

conventional job. And this is found the company, which of

course is a much more risky strategy. We have a random variable, which is a

market for widgets. And, this is a poor market,

moderate nad the great market. And so now we can and we have a

probability distribution over those different values.

The utility of the agent obviously depends on both whether he decides to

found the company, and on the market. And we're denoting that by having both of

these be parents of the utility variable. So the parents of the utility variable,

or the utility function, are the attributes or the variables, on which the

agent's utility depends. So, in this case, our utility has two

arguments F and M. And we can see that if the agent decides

not to found the company. Then the utility is zero, regardless of

the market conditions for widgets because at that point, you don't really care.

And if the agent does decide to found the company, and the market for widgets is

poor, then his utility is -seven. Say he invests a bunch of money, and ends

up losing all of it. If the market is his utility is five.

6:51

And if the market is great, his utility's twenty.

Okay? So now we can go ahead and compute

expected utility for the two action choices.

So in this case the expected utility of, S0 is 0 and the expected utility of F1 is

going to be equal to 0.5 * -7 + 0.3 * 8 + 0.2 * 20.

Which, if you add it all up, is equal to 2 so obviously in this case, the optimal

action, is to found the company. Let's look at a more complicated

influence diagram that involves more random variables.

And also other richer structures so this is an elaboration of our student network.

it has the variables, difficulty and intelligence.

The grade, the quality of the letter. And, in this case, also a variable that

represents the, the job. A student's job prospects which we're

going to assume depends both on the quality of the letter, as well as on the

student's grade because the recruiter has access to the student's transcript.

The student has one decision in this example, which is whether to study in the

course or not which is going to probably affect their grade.

And so we have the B study variable, the study action affects the distribution

that the student has over grades. Now, in this case you'll notice that we

have three different utility diamonds VG, VS, and VQ.

So what are these what do these represent?

They represent different components of the utility function, so in this case

I've broken down the utility function into constituent pieces.

This one represents the happiness with the grade by itself.

This is the utility that you get from saying, Yeah, I got an A in the course,

or ooh not so good, I got a C. So that's one piece of your utility

function. A second piece of your utility function

is your overall career aspirations which is the value of getting a good job,

versus not getting a good job. So that's this piece, over here.

And the final one is the component that is ascribed to the quality of life, that

you have during your studies. So if you don't study maybe you can go

and play ultimate Frisbee and go to the movies every night.

and so that might give you more happiness than sitting there hitting the books

every evening. And so we're going to assume that

particular utility depends on how much you study and on how difficult the course

is. Because maybe it's more fun to study if

it's easy. And we're going to assume that your

overall utility, that the agent's overall utility, is the sum, of these different

components. And this is called the decomposed utility

function, and we'll come back to that later.

10:22

Now, you might wonder why I would bother decomposing the utility function instead

of having just a single big monolithic utility function that is affected by all

of these different factors. Well, that is exactly why because it is,

in fact, affected by all of these different factors.

So, whereas, here, I have a utility function that has one argument.

And another one has one argument and one has two arguments.

If I had put them all together, I would have a single monolithic utility function

that has four arguments. And that would be much harder to elicit

because of the exponential combinatorial growth in the number of possible

combinations that I need to consider. So this is a decomposition of a function.

In this case, the utility function, into factors in exactly the same way that we,

for example, like to decompose a probability distribution.

As a product of factors so another interesting extension of the influence

diagram representation allows us to capture is the notion of the information

available to the agent when they make their decision.

So, let's look at this example over here, which elaborates our entrepreneur

example. And here, we come in with the assumption

that the entrepreneur has the opportunity to conduct to survey regarding the

overall market demand for widgets before deciding whether to found a company.

Now the survey is not entirely reliable. And so, over here, we have the CPD, that

tells us. How likely different survey results are

to come up, given different values of the true market demand.

Now the point is that, having conducted the survey, the agent can base his

decision on the value of the survey. We denote that with the presence of an

edge such as this which indicates that the.

Founder. That the entrepreneur can make different

decisions, depending on how the value of the survey turns out.

So from a formal perspective, that means that what the agent's allowed to is to,

choose a decision rule, which we're going to denote delta, at the action node.

So the, and what a decision rule is, is effectively a CPD.

It tells the agent given an assignment of the values to the parents of A, which are

the val, the variables that the agent can observe prior to making a decision.

The agent, based on that, can decide on the probability distribution of the

action that it takes in that particular situation.

So here for example we would have a CPD which tells us the probability of

founding a company, which in this case is a binary valued variable, given the three

possible values of the survey variable. So you might say, why would I need to

have a CPD? why would the agent ever want to make a

random choice between two different actions?

And it turns out, in fact, that in a single agent decision making situation.

There's no value to allowing that extra expressive power, because of the

terministic CPD would work just as well. Nevertheless for reasons that will become

clear on the next slide, it's actually useful to think about this as a CPD, even

though in most cases it'll actually be the terministic at reasoning samples that

we're talking about. So now that we've given the agent that

expressive power how do you formulate the decision problem that they need to make?

That is what do they get to choose and how do they choose it?

So given a decision rule delta for an action variable A if we inject that into

the decision network now all of the variables in our network, both A and the

remaining variable X, variables X all have a CPD associated with them.

So now we have effectively defined a joint probability distribution.

Over, all the variables X union. The variable A because each of them has a

CPD, and so this. Is, the, this probability distribution.

And the agent expected utility, once they've selected the decision rule delta

A is simply this expectation over here. We average out over all possible

scenarios, where scenarios and assignment to the chance variables X as well as to H

is action A. And but we're averaging out is the

agent's utility in that scenario. And so that's the overall expectation

that the agent prior to anything going on can expect to gain given a decision rule

delta A. So obviously the agents want, the agents

wants to choose the decision rule delta A.

That is going to give him the maximum value for this expression, that denotes

his expected utility. And so this is the optimization problem

that the agent is trying to solve. So how do we go about finding the

decision rule that maximizes the agent's expected utility?

Well, let's look at this first in the context of the simple example, and then

we'll do the general case. So over here, we see, once more, the

expected utility equation for a given decision rule.

And now we're trying to optimize this delta A.

So let's write down the expected utility in this particular example.

So the bayesian network in this case, has now two original CPD's.

There is P of M, that comes in from the M variable.

There's P of S given M, which comes out from the survey variable and we have the

CPD that is as yet to be selected, which specifies the decisional at the at the

action F. And then multiplied with all that is the

utility which depends on the decision to found the company.

And on the true market value. Well, this should look awfully familiar.

It is simply a product of factors. Some of these factors are probabilistic

factors. And one of them, the U, is a different

numerical factor. But it's still a factor which depends on

a scope of two variables, in this case F and M.

So now we can go ahead and apply the same kinds of analysis that we've done in

previous computations. Specifically we can, since we're trying

to optimize over this expression over here the delta F of F given S.

We're going to push in everything that we don't currently need to manipulate, which

is just F and S which are the two things that appear in the factor, the decision

role. And so specifically we're going push in M

and we with in the summation, over M, all of the factors.

that depends on M. And so, that gives us a sum over M, P of

M, P of S given M, U of F, comma M. And if we marginalize out M, what we end

up with, if we multiply these three factors and marginalize that M, is a

factor that we're going to call mu of FS. And the reason that it's called mew is to

suggest that it has utility component U, so mew and U.

And now we have actually a fairly simple expression that the agent is trying to

optimize. It's a summation over all possible values

of S and F over here, of the Delta, the decision rule that the agent takes over

F, given the situation as given the observation, multiplied by this factor

that I just computed. And now if our goal is not to optimize

this expression, what, the agent ought to do is to pick the variance factor, U of

FF, the highest value. the S that gives it the highest value in

each circumstance S. So this is a little bit of strat so let's

look at this in the context of an actual numerical example to see this.

So here are the three factors that we have in this network to begin with.

This is the CPD for M over here. The CPD for S given M in the middle, and

here is the utility factor U, it comes from here.

And now if we're going to go through this expression, through this computation, and

we're going to compute. This expression over here, this is what

we get. It's a factor that gives us this value

for each value of the variable of the action F.

And each value of the survey variable S. And if we want to maximize the summation.

19:29

And we look at this factor over here, it seems clear that what's going to maximize

the summation is if we put full weight on F0 in the case F0, which if you think

about, if you look at the CPDs at the case where the survey suggests that the

market demand is bad, and at the same time in the other two cases we're going

to choose the action to found the company.

And if you look at this, you can see that no other choice of delta is going to give

us a higher value for this summation than this deterministic choice that chooses in

the case, S0, we choose S0, and in the case S1, and S2, we choose the

[INAUDIBLE] And so this is the optimal decision rule.

And if you ask, what is the expectation what is the agent's expected utility with

this strategy before it does anything, before it, before the survey is

conducted. Well, it's just going to be the sum of

01.15+2.1, + 1.15 + 2.1 which sums up to 2.25, as the overall expected utility,

sorry, 3.25. As the agent's overall expected utility

in this case. Now, if you think back to the agent's

expected utility, when you didn't have the information about the survey.

That expected utility, if you recall was 2 so without survey the agent's best map

expected utility was equal to 2. So by conducting the survey, the agent

has gained 1.25 utility points. So, let's generalize this to arbitrary

decision diagrams. We're going to focus on the moment on

single utility nodes and single action nodes.

The more general case is a little more complicated and we're not going to go

through, but the principles are very similar.

So, once again, going into this example, we see that Delta A, the decision with A,

defines a joint distribution. Over all of the Chancion, and the Accuin,

and this is multiplied by the utility under this scenario.

And so we can open up the expected, the, the probability distribution piece of

delta A as a product of two types of CPDs.

We have the original CPDs which is simply the product over I, P of XI given its

parents and then we multiply that by the decision role delta A.

And we're going to use Z just as a shorthand to denote the parents that is

the observations that the agent makes prior who deciding on A.

23:58

So summarizing how this algorithm goes, to compute the maximum expected utility

and find the optimal decision rule at an action A, we take the following steps.

We treat A as a random variable with, at this point, an unknown CPD.

We introduce an additional factor beyond the CPDs which is utility factor who's

scope is the variables that the that the utility depends on, so now we have, as

usual, a collection of factors. We now do, effectively, a variable

elimination algorithm. So this is plain, good old, variable

elimination, to eliminate all variables except A and A's parents, which produces

a factor mu of AZ. And then, for each parent assignment, Z

to the variable A, we choose the action that maximizes over A within that factor.

So, summarizing this the MEU principle provides us with a rigorous foundation

for decision making under uncertainty. Building on that, the framework of

probabilistic graphical models allows us to provide a compacter presentation for

complicated, high dimensional decision making settings where we have multiple

chance variables, multiple actions, and utilities that depend on multiple

variables. We can now build on inference methods in

problistic graphical models, specifically in this case the variable elimination

algorithm to both find the optimal strategy and figure out the overall value

to the agents of a particular decision situation.

And where we didn't talk about this these methods can also be extended through

richer scenarios where we have multiple utility components as in the example we

gave. As well as multiple decisions an agent

makes in a sequence.