0:00
So we're back and now let's take a very close look at Preferential Attachment.
And begin to understand how it generates links and degree distributions.
So, Preferential Attachment so the idea now is newborns again are going to form m
links to existing nodes. born one each period, so time just keeps
track of birth dates. And you could adjust it however you like.
so the important thing now is the formation process is going to be
different. So at sometime t, the number of links
that are going to be in the system are going to tm links, right there.
T nodes born it reached form m links so tm in total.
And the total degree is then 2 times tm, right?
So each link has two different nodes that are going to get to count it as a link.
So we have 2tm as a total number of, of degrees in the system at any point in
time. And the probability of attaching to a
given node is proportional to it's degree compared to the overall degree in the
society. So nodes that have very high degree, are
going to have higher chances. And so in particular, the probability of
attaching to node i, is going to be it's degree divided by 2tm.
Which is very different than the m over t that we had before.
So before, every node that was out there had an equal chance of getting a new
link. Now the chance of getting a new link is
proportional to your degree, compared to the overall degree in the society at that
time period, okay. So this is the proportional attachment
rule. The Preferential Attachment for which
this model is known. Okay, so let's do a mean field
approximation. So again we're going to do a continuous
time of approximation. We're going to be really finding the
degree of the distribution of expected degrees.
And then to check whether this actually matches the real degree distribution.
We could simulate the model, and see whether this is giving us a good,
approximation. There have been theorems now which show
that the mean field approximation of this particular model works well.
when we get to richer models that are different from Preferential attachment.
And the exponential model, then, we're not going to know whether or not the, the
actual degree distribution is well matched by this approximation.
But we can always check and simulate the model and then make sure that the, that
what we're getting out is the, the closed form solution from this kind of system is
not too different than what happens from the simulations.
Okay Distribution of Expected Degrees. So again, when you're born, you form m
links. So that's the initial starting condition.
How does your degree change? Well, you're getting new links at a rate
which is proportional to how your degree compares to 2t over m.
There's m new links being formed per unit of time.
Therefore this differential has this equation, and we can cross out the m's in
the numerator and the denominator. And so what we'll end up with is a simple
differential equation. Right, so we end up with the derivative
with the degree, with respect to time, is just looking like the, the actual degree
over 2t. And we have a starting condition,
differential equation, this one's even easier than the last differential
equation to solve, in fact. So the degree looks like m times the time
over the birth date raised to the one half power.
So square root of t over i. So a simple square root equation that
gives what the degree should look like over time.
Okay, so we've got a very simple equation for the degrees.
And we can compare that. So let's go back to the, situation we
looked at before where people were forming links uniformly at random.
Let's again have everybody form m equals 20 links when they're born.
And let's compare what happens when they're forming the links uniformly at
random compared to preferential attachment.
And what we see is that the Preferential Attachment is in some senses a steeper
curve here. And it ends up with more degree, more
high degree notes. The older ones are, are higher degree
than the, the uniform at random. And then the younger ones are actually
lower degree. It's harder for them to get new links,
because they have lower degree. And the older ones now have an added
advantage, not only of being older and they've gained things.
But they're, also you gain things proportional to how large you already
are. So the older ones are even easier to find
and easier to attach to, under Preferential Attachment.
So they get this extra bump where we see the extra bump in the curve.
So this is going to give us the fat tails.
So the fact that we've got this extra bit here, and this bit lower means we're
going to have more with high degree, more with low degree.
And that's going to generate the, the power law that we're looking for.
5:08
Okay so now, degrees in Preferential Attachment.
Let's figure out the degree distribution. We can do the same kind of calculation we
did before. Which are the nodes with degree less than
35 at some time. Well, we just solve for this curve.
Which are the ones that has the degree exactly 35.
It's going to be all the ones that are younger than that.
So born after that time period, whatever that time was.
And then we can figure what's the fraction that have that?
So we'll be able to figure out our distribution function Fof d, the same
technique we used before. But now with just a slightly different
equation that we're using to solve that same system.
So remember you're degree at time t is m times t over i to the half.
So the nodes with degree less than 35, are going to be the ones for which this
function is less than 35. So 35, has to be bigger than our m here
is 20, t equals 100. So, if you solve that out, and solve that
for i, you get that i has to be bigger than 32.7.
So now, this is 32.7 as opposed to the 42 something that we had before.
So, what's the fraction that have degree less than 35 at this time.
Well, it's going to be 100 minus 32.7 over 100, and then we can solve that out.
[BLANK_AUDIO] Right, so so roughly something like 68% of the nodes are
going to have that property. so when we look at this, just solving it,
generically. Again we've got this equation.
the ones that have degree less than d are going to be the ones where this, equation
is less than d. And therefore these are the i's that are
greater than t times m squared over d squared.
And now then to solve for f this, right, so we've got the, the fraction these are
the i's beyond some level. So going back we're want to find the,
what's the fraction of i's above some degree d.
Well we're just going to take the t minus these i's over t, that'll give us the F,
right. So F is just t minus this compared to t,
which works out to be 1 minus m squared over d squared.
So we have a very simple equation for the degree distribution.
So our degree distribution looks like 1 minus m over d squared.
And if you take the derivative of that and look at what's the density function
for the distribution generated by preferential attachment.
It's 2m squared over d cubed. So, this is proportional to d to the
minus 3, right? So, we've got 2m squared, so this is our
constant, times d to the minus 3. So, now we have a power law, we have
exactly something which looks like a power law.
And indeed when we take logs of each side we get the log of f of d looks like log
of 2m squared minus 3 log of d. So we've got a nice linear equation in,
in log, log plots. Which is exactly the, power law that we
were looking for. And in particular here, one thing that's
sort of interesting, when you look at the coefficient that came out, it's exactly
3. so why 3?
well, that came from the fact that the change in degree, over time had a 2 in
the denominator. And then when we do integration and so
forth we came out with a 3. but basically that's, is, is coming from
the fact that there's a certain number of links present in this society.
And if you want to vary this, you can actually produce different variations on
this coefficient. And the way that you can produce
different variations on this coefficient is just having the number of links being
formed at any point in time. either grow or shrink over time.
So you can have the population, the number of newborn nodes not be a constant
one per period but changing over time. And that'll allow you to control this,
this variable here. So, so a slightly richer variation of
Preferential Attachment can adjust that. There's actually an exercise in the book
I wrote that shows you how to do that. But basically the idea here is that you
can get different variations here by just changing the rate at which the system
grows. But the important thing is that the fact
that the older nodes are gaining, things that are richer, faster and faster time,
meant that, that we end up wth these fat tails and power law.
Okay, so we've, we've gone through a couple of different growing systems.
What we'll do next is try and enrich this a little bit more to span between these
sort of uniform at random and Preferential Attachment, and then that
will allow us to actually take these degree distributions.
And take them to data, and see which ones actually match the data.
So which is, is the actual Preferential Attachment a good match for the, for the
world wide web data? does, which ones match the romance
networks etcetera. So can we find variations on these models
that actually match the observed data.