0:14

So what, what the idea here is again we're

Â going to be looking at, at ways to easily incorporate

Â dependencies and networks and think about different kinds of

Â sub graphs that might be forming and allow those to

Â be generated and try and estimate how they're being generated.

Â And the idea of SUGMs the, the, the terminology is quite simple.

Â It's from subgraph generation model.

Â So we think of subgraphs as being generated.

Â Links, triangles small stars etcetera.

Â Those are the things that are going to be

Â generated, and then the network is a byproduct, right?

Â So what's happening is people are forming

Â different types of relationships, different kinds of

Â cliques, and the network bubbles up from that.

Â Okay?

Â So in particular, what we're going to think of is

Â think of whatever type of, of subnetwork you like.

Â So, maybe it's a link.

Â Maybe it's a triangle and what happens is these things will be independently

Â with some probability p, p sub j and then we end up with some number

Â of those. So, say s sub j, sub-networks.

Â So our 45 links or ten triangles, these things are

Â going to be generated and, the difficulty is that they may overlap.

Â Okay?

Â So, sometimes people form links. They also form a clique.

Â So, I end up partnering with somebody but I also form groups.

Â So if you wanted to look at

Â say a co-authorship network, I might have written

Â with one person.

Â And just one paper but then we also collaborated with somebody

Â else on a different paper, so we end up forming both the

Â link and the triangle and if we just look at the

Â co-authorship network then it's hard to see that directly in the data.

Â So we observed the resulting network.

Â And then what we're trying to do is infer these

Â different probabilities of different types of relationships they came out, okay?

Â So let's just go through a simple example.

Â We start with a bunch of nodes.

Â Let's think of just generating links and triangles.

Â So there's some probability that different links form.

Â So in this case, a bunch of

Â links form here, nine different sorry, triangles form.

Â So nine different triangles are formed at random.

Â So out of all the possible triangles, we form nine of them.

Â Then a bunch of links formed. So the links were dropped down afterwards.

Â And, what do we end up with?

Â We end up incidentally generating some triangles, right?

Â So when we look at this, there was these two edges had

Â already been there because of a triang, of triangles that were generated.

Â This one was generated as a link and it ends up generating a triangle and when

Â we actually see it in the end we

Â can't tell which ones were generated in which way.

Â So what we're left with is some links,

Â some triangles and we're trying to estimate then what is

Â the probability of these different types of sub-graphs being formed.

Â Okay.

Â So, that's the idea So these are sub-graph

Â generation models, so first of all let's try and

Â see is there a way to view these

Â sub-graph generation models as statistical exponential random graph models?

Â Now you remember when we went through exponential random graph models to begin

Â with, I showed you how you could take a a model of, of just an [INAUDIBLE]

Â type where you're generating links at random, and

Â you can represent that as an exponential function.

Â 3:35

So you can do the same with just triangles and links.

Â And it's just going to sum up.

Â So you can go through that same kind of exercise.

Â And it, it shows you then there's actually theorem this is from the paper with Arun.

Â So if you have a sub graph generation model with parameters p sub j.

Â Let s be the true count of how many links were

Â formed and how many different triangles are formed and so forth.

Â So let that be the vector of true counts of subgraphs true is in italics here

Â because we're not going to actually observe the

Â true number, we're going to have to infer that.

Â But then

Â the probability that you would see that

Â would be proportional to An exponential family.

Â So, it would fall under the SERGM family.

Â And, in particular, the betas would actually be the log odds ratios.

Â And, the K's would be how many different configurations you could form of

Â having exactly a particular count s sub j. So, the fact

Â that, you know, how many ways could you get a certain

Â number of of triangles out of how many different triangles there were.

Â Okay?

Â So and then you take that product and that's what the actual K is.

Â So here, the K's look different than the

Â n's we had in the exponential random graph model.

Â But if we just formed these networks, sub

Â graphs at random and then percolate up a network.

Â Then that gives rise to a specific form in the expedential

Â family and it has very particular parameters and very particular weights.

Â So, what's the difficulty here?

Â Well, the difficulty is that the true counts of exactly how many

Â triangles are formed and how many links are formed aren't exactly observable.

Â But we can get close to estimating them and observing them when things are sparse.

Â So, if you noticed in

Â our previous example, we had only one.

Â So given that there weren't too many links and too

Â many triangles there, we only ended up with one incidental.

Â So the incidental was, were pretty rare.

Â It's not that often that you actually get that the, the links

Â fall down in a way that accidentally generate triangles and, and so forth.

Â And so if the things are sparse than we

Â can actually use the particulars to, to estimate these things.

Â So what we'll do next

Â in the next video is talk about try to estimate these kinds of, of models.

Â