0:14

So what, what the idea here is again we're

going to be looking at, at ways to easily incorporate

dependencies and networks and think about different kinds of

sub graphs that might be forming and allow those to

be generated and try and estimate how they're being generated.

And the idea of SUGMs the, the, the terminology is quite simple.

It's from subgraph generation model.

So we think of subgraphs as being generated.

Links, triangles small stars etcetera.

Those are the things that are going to be

generated, and then the network is a byproduct, right?

So what's happening is people are forming

different types of relationships, different kinds of

cliques, and the network bubbles up from that.

Okay?

So in particular, what we're going to think of is

think of whatever type of, of subnetwork you like.

So, maybe it's a link.

Maybe it's a triangle and what happens is these things will be independently

with some probability p, p sub j and then we end up with some number

of those. So, say s sub j, sub-networks.

So our 45 links or ten triangles, these things are

going to be generated and, the difficulty is that they may overlap.

Okay?

So, sometimes people form links. They also form a clique.

So, I end up partnering with somebody but I also form groups.

So if you wanted to look at

say a co-authorship network, I might have written

with one person.

And just one paper but then we also collaborated with somebody

else on a different paper, so we end up forming both the

link and the triangle and if we just look at the

co-authorship network then it's hard to see that directly in the data.

So we observed the resulting network.

And then what we're trying to do is infer these

different probabilities of different types of relationships they came out, okay?

So let's just go through a simple example.

We start with a bunch of nodes.

Let's think of just generating links and triangles.

So there's some probability that different links form.

So in this case, a bunch of

links form here, nine different sorry, triangles form.

So nine different triangles are formed at random.

So out of all the possible triangles, we form nine of them.

Then a bunch of links formed. So the links were dropped down afterwards.

And, what do we end up with?

We end up incidentally generating some triangles, right?

So when we look at this, there was these two edges had

already been there because of a triang, of triangles that were generated.

This one was generated as a link and it ends up generating a triangle and when

we actually see it in the end we

can't tell which ones were generated in which way.

So what we're left with is some links,

some triangles and we're trying to estimate then what is

the probability of these different types of sub-graphs being formed.

Okay.

So, that's the idea So these are sub-graph

generation models, so first of all let's try and

see is there a way to view these

sub-graph generation models as statistical exponential random graph models?

Now you remember when we went through exponential random graph models to begin

with, I showed you how you could take a a model of, of just an [INAUDIBLE]

type where you're generating links at random, and

you can represent that as an exponential function.

3:35

So you can do the same with just triangles and links.

And it's just going to sum up.

So you can go through that same kind of exercise.

And it, it shows you then there's actually theorem this is from the paper with Arun.

So if you have a sub graph generation model with parameters p sub j.

Let s be the true count of how many links were

formed and how many different triangles are formed and so forth.

So let that be the vector of true counts of subgraphs true is in italics here

because we're not going to actually observe the

true number, we're going to have to infer that.

But then

the probability that you would see that

would be proportional to An exponential family.

So, it would fall under the SERGM family.

And, in particular, the betas would actually be the log odds ratios.

And, the K's would be how many different configurations you could form of

having exactly a particular count s sub j. So, the fact

that, you know, how many ways could you get a certain

number of of triangles out of how many different triangles there were.

Okay?

So and then you take that product and that's what the actual K is.

So here, the K's look different than the

n's we had in the exponential random graph model.

But if we just formed these networks, sub

graphs at random and then percolate up a network.

Then that gives rise to a specific form in the expedential

family and it has very particular parameters and very particular weights.

So, what's the difficulty here?

Well, the difficulty is that the true counts of exactly how many

triangles are formed and how many links are formed aren't exactly observable.

But we can get close to estimating them and observing them when things are sparse.

So, if you noticed in

our previous example, we had only one.

So given that there weren't too many links and too

many triangles there, we only ended up with one incidental.

So the incidental was, were pretty rare.

It's not that often that you actually get that the, the links

fall down in a way that accidentally generate triangles and, and so forth.

And so if the things are sparse than we

can actually use the particulars to, to estimate these things.

So what we'll do next

in the next video is talk about try to estimate these kinds of, of models.