0:00

But [COUGH]. Today's topic is an important extension

on the language on graphical models. And it's intended to deal with the very

large class of cases. Where what we'd like to do is not just

write down one kind of graphical model for a particular application.

But rather, come up with something that is a general purpose representation that

allows us to solve multiple problems using the same exact model.

So to understand what that means, in a somewhat more concrete setting.

Let's go back. Now for genetic inheritance that we've

discussed previously it's arguably the earliest sample of Bayesian network

reasoning, before Bayesian network were invented.

And here we have as input a pedigree. Which is a family tree.

And we're interested in reasoning about a particular trait.

And for each pedigreeing each trait we can construct a Bayesian network and the

Bayesian network might look like this. But clearly if you have a somewhat

different family tree. So suddenly you had another three cousins

and a great grandfather join the family tree or if you had a different family

altogether, you would still want to use the same sort of ideas.

The same pieces of what we used in the, in the first network to construct this

other network because there's clearly a lot of commonalities between them.

We have what you might call sharing between models.

But in addition to that, you also have, in this example as a

fairly obvious sharing within the model. So for example,

this CPD that tells us how, Selma's genotype affects Selma's blood type

presumably is the same process by which Marge's genotype affects Marge's blood

type. And the same for Maggie and Lisa and Bart

and Homer and everybody. So so we have this tremendous amount of

sharing of both dependency models and parameters.

Similarly you might argue. You might looking at this realise that

the genetic inheritance model by which Bart's genotype is determined by the

genotype of his two parents is the same inheritance model that applies to Lisa,

to Maggie, to Marge, the Selma, and so on.

So once again we have, a lot of parameters that are shared, not just

between, but also within the model. So we'd like to have, some way of

constructing models that have this large amounts of shared structure, that allows

us to, both, construct very large models from a, sparse parameterization, and also

to construct entire families of models from a single concise representation.

This is not the only such application. This is probably the most commonly used

type of graphical models are those that have shared structure and shared

parameters. So here is another example that we've

seen previously. This is for natural language, processing

a sequence model for in this case, trying to identify name density recognition so.

A very common task for which graphical models have been used.

And here also we have a sequence model. And we have again shared shared pieces

for example the parameters that relate the latent variable. In this case, what

type of what type of variable, what type of part, what type of entity it is.

Is it a person? Is it a location, and so on.

There is a set of parameters here, and they're going to be independent of the

place in the sequence. In which we find the word because, not

because it's necessarily an exactly correct model.

I mean one could clearly imagine cases where the position and the sequence might

make a difference but because it's often a very useful, simplifying assumption

specifically because it allows us to A, use parameters, re-use parameters, and B,

allow us to apply the same model to sequences of varying length without

having to worry about what is my fifteen word model versus what's my eight word

model. And so this is a, another case where you

have We have a tremendous amount of of

Reuse of parameters. We've already seen similarly the example

of image segmentation. Clearly we don't want to have a separate

model for every superpixel in the image. There's hundreds of superpixels so we

have sharing across [INAUDIBLE] super pixels or pixels.

So, for example the model here that, that relates the class label of a super pixel

to the image features of that super pixel is generally going to be shared.

As are the parameters that involve adjacent super pixels.

So these edge potentials are also going to be shared across.

In this case, pairs of super pixels. These pairs of adjacent superpixels.

And once again, we have sharing across, models as well.

Because we are going to have one such model for image a.

And obviously, we don't want to construct a separate model for every image.

So, once again, we have sharing between and who then, a model.

5:45

so lets look at one example a little bit more concretely because it's an example

that we're going to use in some of the later analysis.

So go, now lets return to our university example where we have

student who takes a class and gets a grade and that grade depends on the

student's difficulty of, I'm sorry, difficulty of the course, not on

intelligence of the student. Now this is all fine if we're interested

reading in just about an individual student, but now let's imagine that we

want to think about an entire university. So now we have a difficulty variable, so

these are all difficulty variables. The different courses, in this case, C1

up to CN are different courses that exist in our university.

And conversely on the other side, we have multiple students and this is a set of

intelligence variables that are indexed by different students.

So, we have the intelligence of student one up to the intelligence of student m.

Now note that these are different random variables.

They can and generally will take different values from each other.

but they all share a probabilistic model. And that's sort of the kind of sharing

that we have in mind. And what we see here is that the grade of

a student within a course, which are these variables done here, depend on the

difficulty of the relevant course and the intelligence of the relevant student.

So for example, the grade of student one in course one depends on the difficulty

of course one and on the intelligence of student student one.

And, once again we have, sharing of the both the structure and the parameters

across these different grade variables. So that they all have, the same kind of

dependency structure, and the same CPD. another example is that of robot

localization, and this is another example of, in this case, a time series.

7:49

the robot moves through time from one position to another, and although the

position. At time t, is different changes over

time. We expected the dynamics of the robot are

fixed. We will talk more about that later.

So that gives us a graphical model that, again, looks a little bit.

this is one example of such a graphical model.

Where we see the B position, the robot pose over here, these X variables depend

on, for example, the previous pose, on whatever control action the robot took.

And we're assuming, that once again, we have sharing of these parameters, where

each extentiation of this variable. So, what then gives rise to is a class of

models that are represented in terms of template variables, where a template

variable is something that we end up replicating in many cases again and again

within a single model as well as across models, and so the replication is indexed

by the, by the fact that the variable, you can think of it as a function that

takes arguments, and the arguments for example might be, high in points as in

this example over here. So here we have a location, variable

that's index by time point or sonar reading with index by time point.

We have in this case, a genotype variable and a phenotype variable that's indexed

by a particular person. a pic, a label, a class label that's

indexed by a pixel, and similarly the difficulty of the intelligence and the

grades that are indexed, and in this case, different combinations of indices

course student or course student pairs. And the template model is a, language

that tels us how template variables can be, the dependency models fro template

variables and how concrete instantiations of, variables, what are called ground

variables. Like the ones that are actually indexed

by a particular time point or person how they inherit the dependency model

[INAUDIBLE]. And there is a whole range of such

languages that have been developed in the special purpose way for different

applications so dynamic Bayesian networks are intended for dealing with temporal

processes, for example. Where we have replication over time.

We have a whole range of language for object relation models, both directed

models and undirected models where we have multiple objects such as people or

students or courses such as pixels so people.

Horses, pixels, and lots of other things that can be related to each other in

different ways, and how do you represent the dependency model over that ensemble

in a coherent way?