0:00

So we're talking about exponential random graph models

Â and some of the issues in estimating those.

Â And so now what I want to do is, is talk

Â about another class of models

Â called Statistical Exponential Random Graph Models.

Â So SERGMs for short.

Â And so what we've been through is we had

Â sets of models that are, are linked based, other models

Â which are still link by link based which can

Â begin to capture different things like clustering and so forth.

Â We brought in attributes

Â in the Stochastic block models.

Â And then we said okay, now there's the class of models which allow us

Â to capture richer things where there's two

Â dependencies and trying to estimate those things statistically.

Â Exponential Random Graph Models, but as we

Â saw there is difficulties in estimating those.

Â So what I want to talk about now is a

Â class of new ones called Statistical Exponential Random Graph Models.

Â That'll allow us to keep the track of these local features

Â and dependencies and actually do some accurate and fast and easy estimation.

Â 'Kay.

Â So, the Exponential Random Graph Models are

Â not accurately estimable in, in many cases.

Â There's basically just too many alternative networks to consider.

Â So what's the idea here?

Â What's the way out?

Â The way out is going to be that many networks lead to the same statistics.

Â So for instance in the, in the simulations we did in

Â the last video what we had was a series of you know,

Â 45 links, 10 triangles and 20 isolated nodes.

Â And in fact, under the statistical, under an Exponential Random Graph

Â formulation, any network that had exactly

Â those characteristics would be equally likely.

Â They'd all lead to the same function, they'd lead to the same probability.

Â So what we can do is we can just say,

Â okay look, even though there's many different networks to search over,

Â In fact, what we really need to keep track of is just the number of different

Â types of statistics, because all the networks that

Â have the same statistic have the same probability.

Â And so we can look at those as equivalent ones.

Â Collapse all those equivalent networks.

Â And that makes the summation and the area that we have

Â to be searching over much smaller than what it was before.

Â Okay.

Â And, and part of this came out, you know, so, so why am I

Â going to talk about this, this is coming out of a paper with Arun Chandrasekhar.

Â And effectively, the way I got interested in this

Â was in projects where I had to be using Exponential

Â Random Graph Models, or, or something that really accounted

Â for dependencies, and finding that the software just didn't work.

Â And, and I couldn't find any models that were actually working.

Â And so, the idea was, okay, we need some models, so let's go develop them.

Â So, so here I am going to tell you about a class of

Â models which I have been working with, which I think solve a lot

Â of the problems that we have there in the literature.

Â When you collapse these equilvalent networks and so

Â lets just go through the, the idea here.

Â So we start with our exponential random graph model, right?

Â So we have got some vector of statistics.

Â 3:28

So how many different networks would have exactly

Â this, that will let N of s be this.

Â Now that's not necessarily an easy number to, to calculate.

Â And in some cases it can actually be a, a complicated number

Â to even estimate, but we'll talk about that in, in a moment.

Â So what we can do is, is we take this original Exponential

Â Random Graph Model and what we can do is, is instead of summing

Â over all g prime, we say that any g primes that have the same

Â statistics are going to lead to the same exponential here, right?

Â So if two, if g prime and g double prime have

Â the same statistics, they're going to give us the same number down here.

Â So instead, let's just sum across the s primes, and

Â then weight that by how many different

Â networks would have generated that same s prime.

Â Okay?

Â So we've changed this into a summation

Â over statistics, instead of of the raw networks.

Â So now, now we just have to worry

Â about how many, instead of how many different networks

Â there are in the world, we just need to think of how many, what's the range here.

Â Well it's the number of triangles you could possibly have with the number

Â of links you could possibly have which is going to be much smaller.

Â And the second thing we're going to do is

Â then, instead of thinking about the probability of a

Â network, we can actually think of what's the

Â probability that we see a certain set of statistics.

Â So ultimately, it's not so much that we're interested in the particular network

Â we saw, as much as it had 20 isolates, 10 triangles and 45 links.

Â So that was the really important aspect, was that it had certain attributes.

Â Those are the kinds

Â of things we're going to want to test as scientists.

Â You know, does the network have certain characteristics?

Â That was a very a very particular network realized, as opposed to some other one.

Â Whatever the characteristics we're really interested

Â in, we can put into those statistics.

Â And so this is what we're ultimately interested in.

Â What's the probability that we see a network that has certain characteristics.

Â Okay?

Â So now we're going to call this the statistical form here.

Â Because we've gotten rid of the networks and we are just talking about the

Â properties of the network.

Â And all estimations are now in statistics based which

Â can be a much smaller space than we are working

Â in than the original space in, in that ideas

Â as the idea behind the Statistical Exponential Random Graph Models.

Â And of course, anywhere one of these representations of

Â an Exponential Random Graph Model then has a corresponding

Â 5:48

representation in this format.

Â And if we can estimate the parameters here, then that

Â tells us what the parameters would be in the original model.

Â So we can work in this space, if we can do the

Â estimation and, and track back and we get the Exponential Random Graph spot.

Â So the basic idea here, just to emphasize.

Â Instead of asking what the probability of a specific network is, we ask

Â what's the probability of seeing certain

Â density of links, clustering, average path length.

Â Whatever the statistics are you're interested

Â in, what is the probability of seeing something that has those characteristics?

Â Okay.

Â Now one other thing is, one, once we start representing

Â things in this form, so we got this statistical form.

Â Then we can begin to say, you know, why do

Â we have what's known as particular, what's known as reference distributions.

Â So here this is the number of graphs that have a certain set

Â of statistics but it's not obvious that we might not want something else.

Â And in particular,

Â you can put in whatever k function you know, instead

Â of the number of graphs that have a certain statistics.

Â Put in some other number.

Â You could weight these differently.

Â And that would still be a valid model of how statistics are generated.

Â And in particular in a few minutes, we're going to

Â see a model which give us, will give rise to

Â Ks which will look actually very different than the ones

Â that come out of the usual Exponential Random Graph family.

Â So there's a,

Â a natural way of thinking of other kinds of weights than the ones that are there.

Â Okay.

Â So I, I, I have just been through emphasizing idea.

Â So now we have got thing in statistical space that's going to

Â be easier to work with and working directly in a graph space.

Â Okay.

Â Now, one, one thing is you can encode whatever you want in terms of statistics.

Â You can also begin to, to put in preference based models.

Â And we'll see some of that

Â I guess in, in the fourth week of the course.

Â So we'll come back and, and you can begin to do preference-based models.

Â And test for whether people have certain kinds of

Â preferences, and biases and preferences in these models as well.

Â Okay.

Â So what's the real challenge now still in estimating these, just to reemphasize?

Â 7:48

What was, usually faced with is one network, right?

Â We don't see many different networks.

Â Usually, we just see one realization. And now we've

Â got these inter-dependencies.

Â So we've got problems in, in working with large

Â numbers because we've, we have basically only got one observation.

Â But we do have many observations of how

Â many links are present, which triangles are present.

Â Even though these qui, aren't quite independent, we

Â can still ask the question, are they enough?

Â Is there enough information in one of these things to estimate a model?

Â Okay?

Â And, and so it's not completely obvious that

Â as you estimate these kinds of

Â models, where you've got all these, inter-dependencies,

Â that whatever estimates of the parameters

Â you're going to get are going to be accurate.

Â So we can do, you know, we can go ahead and do say maximum likelihood

Â on this kind of model, an Exponential

Â Random Graph Model, now within a statistical form.

Â And we can go through and estimate these things.

Â So, you know, we saw certain statistic come out.

Â We can say okay, what are the betas which actually maximize this function

Â that make it most likely that we saw these statistics rather than some other ones?

Â Now we can just go ahead and do maximum likelihood on this function.

Â And we can ask, when is it that these

Â betas will be accurate estimates of the true parameters.

Â How large does the, does our data set has to be before we're

Â pretty sure that were getting an accurate look at what nature was doing?

Â Okay.

Â So there is a set of theorems that we have in

Â a paper.

Â In a 2012 paper, where we actually show classes

Â of, of statistical Exponential Random Graph Models for which

Â maximum likelihood is going to converge to the true parameters

Â and grows and provides some easy ways of estimating those.

Â And instead of going into that detail, you, you,

Â you are free to to look at the paper.

Â What I'm going to do is look at a very

Â simple related set of models, which will generate particular

Â case, give us some idea of how this things

Â work, and they do really simple and, and fast estimation.

Â So that will be your next topic.

Â