0:01

Hello again, folks. So, let's talk a little bit about other

kinds of growing random network models. And in particular, we're going to talk

about a hybrid model now that spans between these uniform at random models

and the preferential attachment model. And you know the idea here is, is again

we're looking at random network models and now what we want, want to do is, is

try and match degree distributions that are out there in the real world.

So, we have a variety of, of degree distributions that are out there in the

real world. Some of them look more like uniform at

random, some look more like preferential attachment.

But we want a way of actually fitting a model so that we can say that this one

looks like you know, entirely preferential attachment or 80%

preferential attachment or some percentage.

So, we have some idea of exactly what is going on in, in, in generating different

degree distributions and the, the idea here is that we'll, we'll be able to fit

these models to data. So again, when we looked back at the

Notre Dame worldwide web and so forth and we end up working with preferential

attachment. Preferential attachment gave us a curve

which has a slope of minus 3 and would be linear in log log plot...

at least to our eyeballs, that looks like it might match this data pretty well, but

we'll have to actually get it, you know, tested carefully to see whether or not

that's a good match for what's actually going on in those data.

what's interesting though, is when we look at say, co-authorship data so here,

this is co-authorship. This is data from the economic

co-authorship data sets of Goyal, van der Leij and Moraga Gonzales.

And in particular we see that there are fat tails right?

So we've got higher degrees and more high degrees and more low degrees.

Then uniformally at random but this is hardly linear right.

So if we drew a line through here its not linear.

So we've got something which looks like it is has fat tails but its not well fit

by a power distribution. And so, can we say a little bit about

whats going on in terms of how this network might have been formed?

the this kind of growing process. And again that's a natural kind of model

to look at because co-authors are coming in over time and forming new

relationships as, as they're born and gaining relationships over time as they

mature. Okay, so what we're going to do is look

at a simple hybrid model. And this hybrid model has been discussed

in different papers. I'm going to talk about a version that I

did with Brain Rodgers in 2007. The very simple version and will allow us

to really span between these and then go directly and fit to data.

And the idea is, is the simplest one you could imagine in terms of, of in forming

a hybrid of these. what we actually do in the papers is, a,

a little richer, but I'm just going to simplify this for the sake of, of our

analysis here. what we're going to do is, some fraction

of the new links that are formed in any point in time.

A fraction, a, are going to be done uniformly at random.

And 1 minus a are will be found via searching the neighborhoods of friends,

okay. So what does that mean?

That means you know there's a bunch of nodes that are already out there.

Let's suppose that they've formed some network.

So they're already existing. and a new node is born, and that new

node, first, forms some links uniformly at random.

'Kay, so let's see, let's do this with blue.

So it forms a, a couple of, of links uniformly at random.

4:03

And then what it does is it picks new nodes to link to by looking at the

friends of this. So this node, maybe it, it identifies

this node then, and then forms some additional ones instead of searching

uniformly at random. What it does is it meets the friends of

friends. Okay, so it begins to look at the friends

of the existing ones, and then form links to those nodes.

So it picks some nodes, uniformly at random, and then also searches the

neighborhoods of those nodes. So, you just, for instance go on to the

worldwide web, you start looking through some webpages.

And then you start clicking on links from those pages and then finding those.

Or you meet somebody and then you meet some of their friends.

So this is a very natural process. And what we're going to is we're going to

have one fraction formed uniformly at random and then the rest, the one minus a

fraction formed via meeting friends of friends.

And what I want to convince you of is that, that second part is going to look

like preferential attachment. So, that will give us an explanation for

how preferential attachment might actually be working.

Why would it, why would we have preferential attachment?

Okay, so let's think about meeting friends of friends.

So you're forming am of your nodes uniformly at random.

And then also meets 1 minus a times m of their neighbors and attaches to those

nodes, too. And the distribution what's, what's

important is the distribution of neighbors' nodes is not the same as the

degree distribution, even with independent link formation.

Okay, so a neighbor is more likely to be of high degree.

And let's go through the logic of that, why is that true.

so let's think of a network, suppose that we are in a situation where have of the

nodes had degree k and half the nodes had degree 2 k, okay?

So some of them are twice as connected as others.

Let's just do a simple thought experiment.

So pick a link in that network at random and then pick a node at one end of that.

Okay. Now, there's a 2 3rds chance that the

node you'll find is actually of degree 2k and a 1 3rd chance that the node is of

degree k. So it's actually twice as likely that

you're going to find the node with the higher degree as compared to the node

with the lower degree. Even though they're evenly distributed,

even though there's half as many of each, I mean, there, there, half the population

is of each, they're equally likely in the society.

Through this process, you're actually going to find the higher degree nodes

with more probability. Why?

Because they have more links. So, if you find any node on the end of

one of their connections. If you find one of their neighbors, you

find them. Right?

So, here if you select a link and then you select a node on one end of it is, if

I have twice as many links it's twice as easy to find me in that network as it is

to find somebody who has half as many links.

And so if you're looking at my, at my friends and then trying to find any of my

friends. If I've got twice as many friends, I'm

twice as easy to find in that system as somebody who has half as many friends,

okay? So the idea is that if you're actually

looking through a network and finding people by looking at nodes and then

looking at their friends. You're going to be more likely to find

higher degree nodes, and, then we're going to have a form of preferential

attachment. Nodes that have more friends are going to

be easier to find, their going to get more friends through this process, and it

continues. Okay, so, very simple process, very

intuitive and natural one now, that will explain why we get preferential

attachment. And then as we vary this parameter a, we

can have things look like just forming uniformly at random network, or if we're

doing things mainly through the network search process.

We are going to tend to have most of the links formed in ways that are going to

favor better connected nodes.

[BLANK_AUDIO]

so, what we're going to do, we again the, the, the chance of so, randomly find a

node. randomly pick one of the nodes it's

attached to. And, that's the sort of finding of

friends of friends bit. And chance of finding some node via this,

second part of the procedure is proportional to its degree.

We find it, if you find one of the, the other nodes that attach to it.

So, If it, if it had twice as many people attached to it, it's twice as easy to

find. Okay, so now what does this model look

like then? fraction a forms uniformly at random, 1

minus a via preferential attachment. So we can go back to our mean field

approximation, do exactly what we did before.

We're going to still have the starting condition that when you're born, you form

m links. And now, what we're going to have is a

fraction a of my new links over time are going to look like the uniformly random m

over t. And a fraction 1 minus a, are going to

look like they did in preferential attachment which is proportional to my

degree over 2t. Right so there's an m, over 2t m, I've

divided out the m's So, we've got this simple differential equation.

And, if you solve that, so now we've got another differential equation.

you can just check that this differential equation, if you differentiate this with

respect to time, you'll get this back. And the initial starting conditions.

So, this is the solution to that differential equation.

Okay, I'm not going to go through solving differential equations, this is a

relatively standard differential form, and this is the, the solution for that

differential equation and with that starting condition.

Okay? So now we have a very nice sys, system.

Again, what this does, is tell us what an expected degree is, at any point in time,

for a given node. As a function of how much time has gone

by, and what its birth date was. And therefore, we can then solve this to

figure out which nodes you're going to have degree less than some amount at any

time. That gives us the degree distribution, so

exactly the same procedure we've been doing a couple of times before.

And so here, the nodes that have de, expected degrees less than d at some time

are the ones for which there there equation for what their degree is, is

less than d at that time. And here there's an expression just with

simple, put an x in here to simplify this.

Choose a notation to substitute in x is equal to 2, 1 minus a, and that makes

this equation just a little easier to read.

So, the critical i is the one for which this is true.

So if you solve the i for which they're exactly at degree d the, they solved

this. And therefore remember the f of d are

going to be the, the fraction of nodes that were born after the ones that have

this critical i that has exactly that d. And, so if you substitute in from the,

the i here, what do we get when we substitute that in?

We get f of d is equal to this equation, where x is equal to 2 over 1 minus a.

So this is slightly more complicated expression than we had before.

And it looks partly like, so we've got a d in the denominator raised to some

power. the power here depends on the parameters

of the of, of the a and so forth. And you can check that when you look at

this equation. As a gets near 1, so that all of the

links are being formed uniformly at random, then this is nearly a negative

exponential distribution. Where as, as a gets close to 0, then x

just becomes a 2. this is going to disappear, this is

going to disappear, and we're going to get our 1 minus m over d raised to the to

power the differentiative, the density function, I don't have a minus three.

So this will look exactly like preferntial attachment.

So this spans between those different distributions and as we change a we're

going to end up with a distribution which looks either like this negative

exponential. Or begins to spin out and look more like

a power law distribution, as we let a become close to zero, and almost all the

links before, er, the friends of friends, which gives us preferential attachment.

Okay, let's simulate these things just to check.

so this is after a thousand periods. simulating when m is equal to 2 and a is

equal to half. And you can just check either the log of

the distribution function, versus of the degree, or the log of the frequency.

So this is a, we know that the extremes that the expected degree is a good match

for the actual degree distribution. here, if we put it in the middle it, it

seems to be fitting well. So the simulated degree distributions are

coming out well compared to the actual degree distribution.

So, so when you simulate with a half, you get a good match.

if you simulate with a equals 1, so it's uniformly at random, it's a good match.

actually interestingly, if you the, this is just on simulations when a is 0 it's

not exactly on, but this is one where we know the proof actually shows that

you get a good match so it, it just means that for, for pure preferential

attachment, you're going to need more than a thousand periods before this thing

begins to be a good approximation. Okay so, so we've solved for a degree

distribution and what we can do next, is now we've got a degree distribution, a

hybrid distribution that spans between these different extremes.

We can start taking this to data and begin to back out what that a is in the

data. How many links look like they're being

formed more at random. How many look like they're being formed

through the network search process, being meaning friends of friends.

And then see whether we see differences in different data sets.