0:34

Let’s start by modeling repetition.

So in this case,

imagine that we're repeatedly tossing the same coin again and again.

So we have an outcome variable, and

what we'd like to model is the repetition of multiple tosses.

And so we're going to put a little box around that outcome variable,

and this box which is called a plate.

2:00

of the CPD explicitly into the model.

So this random variable theta is the actual CPD parameterization.

And I'm putting it explicitly, so

that I can show how different variable depend on that.

And so if we have the parameters here,

we can see that theta is outside of the plate.

3:04

Let's look at a slightly more interesting example.

Going back to our university with multiple students,

we now have a two variable model where we have intelligence and grade.

And we now index that by different students s, which again indicates

that we have a repetition, a copying of this template model.

In this case, I only made two copies for one for student 1 and the other one for

student 2.

3:30

And once again, if we wanted to encode dependence on the parameters.

So we might have theta i, which represents the CPD for i.

And we might have theta g, which represents the CPD for g.

And we would have exactly the same idea of theta i and theta g.

Where theta i enforces the two i variables and theta g enforces the two g variables,

and again, they're out of the plate.

The importance, sometimes in many models,

we will include those parameters explicitly within the model.

But often when you have a parameter that's outside of all plates.

4:46

And courses we're going to call a little c and

students we're going to call a little s.

And so now let's think about how you might replicate variables of correspond

to properties of courses and variables that correspond to properties of students.

So the difficulty variable belongs in the course plate because it's

a property of course.

So it's going to be difficulty of course and

always think about how we are going to put students in?

One possibility is that we're going to nest.

5:20

Now what that means is that the student of each variable here,

both of these variables are indexed by both s and c.

Because when a variable is nested in a plate,

it means it has the indices of all plates that it's nested in.

So if the intelligence variable is in both the s plate and

c plate, it's going to be indexed by both.

5:48

So let's build that model and see what it looks like when we sort of unravel

the courses and unravel the students.

It can look like that, that we're going to have the difficulty of,

let say this is a two course model and the two student model.

So we have the difficulty of course one and the difficulty of course two.

And now we have the variables in the nested plate I and G.

And we can see that they're both parametrized by both student and course.

7:02

Now, let's think about the implications of this.

This tells us that there is a core specific intelligence for every student,

for every student in every course and that may or may not be what we want.

If you're taking radically different courses and one is in art class and

one is a math class.

Then you could say that there is an art intelligence representing skill if you

will in art.

Then you have a math skill or math intelligence

that you might actually want to have two different kinds of intelligence and

not assume that they're necessarily the same thing.

Of course, that's kind of complicates the model, and

if you have a bunch of corrupt courses that are in some ways similar to

each other and take a similar set of skills.

You might not want to have a bunch of independent, look independent.

8:12

And so, that gives us an alternative representation, which is what's called

plates that are not nested, that overlap with each other.

So in this case, we have the course plate which is this plate over here and

we have the students plate which is this one over here and

the assumption is the difficulty of the property only of the course.

Though this is the difficulty.

8:49

And when we unravel this one,

what we end up with a model is a model that looks like this.

So in this case, we only have a single,

we have a difficulty for the course, we have an intelligence for the student.

And over here, puts the note things in the intersection in green.

We have the grade of the student in the course depends on

the difficulty of the course and on the intelligence of the student.

10:01

So why are these kinds of plate models useful?

So let's look at an example to convince ourselves

that by building these richly structured models, that involve multiple entities,

you can actually get much more interesting conclusions.

So let's look at this example over here.

Imagine that we have this first quarter freshman, came into our university, and

we'd like to figure out what we can determine about him.

So let's say that in this particular university,

our priority believe that most students have high intelligence and so

this is the intelligence distribution and 80% high.

Now, these students were in a call George took two classes.

He took Geo101 and got an A.

10:44

So probability that he's intelligent goes up.

He took CS101, didn't do so well, got a C.

Now, the probability goes down, but it doesn't go down to a very low number.

And that's because we know from the CPD for grade that we've seen previously, so

there may be other multiple reasons why student's might not do well in the class,

for example, it was a really hard class, so everybody did battle and

didn't take issues seriously.

If these are the only two courses that George took, we're kind of stuck.

But now let's think about this in a more holistic context, or Collective Inference,

where we're going to think about a number of students taking a number of classes and

let's imagine that we have a bunch of grades for all of those students.

So what we see here are, the green ones are As,

the yellow ones are Bs and

11:37

the red ones are Cs and

what you see here is a short transfer about to observe great variables.

I didn't put in all little dots that represents the great

variables I just put in these lines that indicate with their.

So you can think of this network if you will.

So now let's think about what kind of conclusions we can

reach from this network.

And seems even looking at this by eye,

we can see that a bunch of people took CS101.

And they all except for our friend, George, and furthermore,

even if we look at this guy over here,

who got a C in every other class that he took, he still managed to ace CS101.

12:45

And so, we can reach much more important conclusion in the setting but

we can by reasoning about individuals and isolation.

Now, this is a toy example, but we'll see later on examples of collective

inference where we have multiple interrelated entities.

It could be related pixels in an image, it can be related

14:00

And what we have is that each of these has to be a subset of this.

So what does that mean?

It means, for example, that for the template variable G ( s,

c), so the G corresponds to variable A, s and

c correspond to the indices, in this case, U1 and U2.

And what we have is two template peers.

We have I of s, and D of c.

14:51

an index in the parent that doesn't appear in the child.

So, for example, we cannot have in this model, for

reasons that I'll describe in a minute, the notion of for example, honors for

student s, depending on the grade of the student in multiple courses.

16:23

So specifically if we have this model, if we have this variable A of U1 up to Uk,

then for any instantiation little u1 up to uk, which are concrete

instantiations of the indices, we would have the following model.

We would have the variable A of little u1 up to little uk,

depending on the specific,

16:59

Which is potentially confusing notation,

because the sets are a little bit hard to understand.

But just really, just think concretely of the example.

This exactly says that the grade of a particular student in a particular course

depends on the difficulty of that course and on the intelligence of that student.

That's all it says, okay?

So it's just a general way of saying [INAUDIBLE].

17:48

So, to summarize, plate models are going,

which allows us to find a template for an infinite set of Bayesian networks.

Why infinite?

Because you can have 3 students, 10 students, 1,000 students,

a million students, an unbounded number of students.

So there's an infinite set of Bayesian networks that we can use this language to

encode.

And each of them use a different combination of domain objects in our

example, for instance, students and courses.

The parameters and the structure are reused in both

within the base net and across the different base nets.

So for example, within our university example, we will use the same parameter.

And if we have a different university with a different set of students and courses,

we would will still use the same parameters.

These models, by allowing us to represent an intricate network of dependencies,

allow us to capture very richly correlated structures in a concise way.

Which allows us to do this kind of collective inference,

which is potentially a very powerful source for informed conclusions.

Now I've presented place models, which are the perhaps earliest and

one of the simplest of these languages,

which allow us to represent template structures.

This is a simple one, for example, it has this restriction on the parents not

having variables that are not instantiated in the trial.

And so for example, you can't represent temporal models here because X(T)-1

is not instantiated in the variable X(T).

So you can't have X(T)-1 as a parent of x sub p.

Not in the price model, I mean, obviously, we have languages that can do that, but

not this one.

Similarly, you can't have the genotype and

the genotype of the father affect the genotype of the child, because,

once again, the child doesn't instantiate the mother and the father.

These are separate indices.

And so this is a limited language, but

there's many other languages that expand on it in different ways.

And they each have different tradeoffs in terms of what they express easily and

what they don't.

And there's an entire literature on this that we're not going to go into.

But it has provided a number of very useful languages representing

these kinds of richly structured models.