0:20

Okay, Let's say a session between one and two,

one and three, one and four, one and five. Then a session between two and three going

through this path, like two and four and two and five, And so on. There are all

together ten possible sessions among these five nodes.

And there are five links five directional links.

We can look at the degree distribution here.

The degree distribution for this graph is very simple is three two, two, two one.

Okay, of course this is extremely small example to illustrate our point.

But this enables us to write the matrices and optimization in a single slide.

Alright so let's look at call this graph G1.

Okay let's look at S of G1 and P of G1. The smaller S of G1 is easy to compute

because the sum of the DIDJ's for all the IJ's are connected.

For example notice 1,2 are connected so the first term is D1 plus D2.

And there are all together five terms here.

D1 + D3 + D1. + D5 + D2 + times D4 + D3 + D4.

Okay, that's all associated for the left graph.

We'll talk about the right graph in a minute.

So, you can add this up. For example, the first term is three times

two. Okay.

With all the terms and then you get a number 23.

Now, how do find the big'S' of'G1'. We need to know what kind of graph with a

no degree distribution of 3,2,2,2,2,1. Would give you the largest small s.

Why if you look at all these terms, you see that the best way to do that is to

make a simple swap, okay? Instead of three connect to four and one

connect to five, one should connect to four, because then we get a term of three

multiply two, instead of three multiply one.

Okay? So switch one to five to one to four.

And then switch three to four to three to five.

You can verify that, in this right hand graph G2 called.

It has the exact same no degree distribution 3,2,2,2,1.

Again, it's not exactly Pareto but it suffices to illustrate our point of

tradeoff between S of G and P of G, Okay?

And then you can carry out the computation for small s of G2 is 24.

And therefore big S of G1 is, is small s divided by the maximum as you can get out

of all the graphs with this degree distribution, which is 24.

So, it's 23 out of 24. And clearly, big S G2 is 24 out of 24, is

one So, one is less than one the other is one.

Therefore this graph, g1 is less likely to be joined at random than G2.

Just like internet is less likely to be drawn than preferential attachment type of

topology. So now the question is what about

performance. P of G1 and P of G2.

Let's run this exercise for P of G1. P of G2 is very similar.

4:04

So we just calculated the likelihood already.

Now here's the performance. So first of all, we have to write out this

routing matrix, R, which is five by ten because there are ten possible sessions,

Okay? And the variables is basically X1, X2 dot, dot, dot, down to X10.

And there are five, links, five nodes here, okay?

So R is five by ten. Let's say the first, session is the

session of session one. Is the session that goes from node one to

node two. And therefore it's simply one, one, zero,

zero, zero is this call. Because the first session goes through

nodes one and two, and no other nodes, okay?

And the second session, which is let's say, one goes to three.

5:46

So this session the fifth column is the session number five going from node two to

node three. And the path will traverse the following

the three nodes eleven. The first three nodes one, two, three.

So the entries in these columns are 1,1,1,0,0.

So you can fill out the rest. And now, with this routing matrix

constructed, we say, this matrix multiply, our variable should be less than.

The total bandwidth available per router, and out of these five nodes.

Let's say the BKs are exactly the same as the degree.

They don't have to be, just to make the numbers simpler.

And therefore, they are three, two, two, two, one, basically.

So the total bandwidth a number of packets can process is just is directly

proportional to their node degrees. So per degree, basically, assuming has a

unit capacity of processing. So, subject to this constraint,

7:07

Okay? And, the definition that each x is really

row times the demand. And the supply so we can make up some data

for these y values, okay? For example, we can just say the y vector

is five, two, four, four, two and therefore, for each session, there're ten

of them, you can look at what are the sources destination, multiply the number

and then you get another constraint. So, constraint one, constraint.

Two subject to these two constraints, maximize the summation of x I in this, the

objective function for performance. If you solve this very simple linear

optimization problem, you get the following answer.

The following answer, X star, is some vector actually doesn't matter.

Row star actually doesn't matter. It turns out to be 0.033, but that's not

key. The key is that the maximized the sum of

these end to end sessions throughput, that is the performance, with this graph G1 is

3.73 certain unit, let's say, gigabits per second.

8:28

. Now we can carry out exact same procedure

for this graph G2, Okay?

And then you're going to get a different number.

It turns out that the performance for G2 graph is 3.03.

Gigabit per second which is clearly smaller than 3.73 significantly.

And the reason is quite clear, because in order make that most highly connected node

to be sort of in the center of this very small graph, your actually creating a

performance bottleneck in supporting throughput.

So now if I plot the location of G1G2 in the P performance versus S likelihood,

nominalized likelihood graph, I see that they follow the same trend as the Internet

versus preferential attachment, except this is a much, much smaller example.

So we get a very small dynamic range, but none the less, you see a qualitative trade

off between performance and likelihood. So that is our small example.

And we are going to conclude this part of the lecture.

9:49

In the advanced material we'll say a lot more about both preferential attachment

and constraint optimization as generative models of scale-free networks.

And there's a very rich and interesting background.

The history goes all the way back to, Zips.

And then to Pareto and Yudi. Who basically discovered that for many

phenomena, whether it's the size of the city, or the frequency of appearance of

words in languages. Many distributions.

The [inaudible]. Most popular, by histogram tallying item.

Often shows up with a frequency that is one over K.

Now, clearly, that is our scale free phenomenon.

It doesn't matter what K is, the chance of it shows up is one over K.

10:48

So there's no specific characteristic scale, just like parietal distribution, as

you can see parietal in the history of this evolution.

And then, back in the 1950s there was this debate to say all these are observations.

The question is, how do you explain the observation.

And there was a debate between Menderbroat and Simon.

And that debate is very similar to the debate about ten years ago in early 2000

between profound attachment and cost string optimization.

Except the context of course, in 1950s was not internet router topology.

11:30

And very briefly speaking, Before going to mass material preferential

attachments says that when new nodes added to a graph, they are more likely to be

connected to those with a lot of connections already.

So the rich gets richer more connected gets even more connected.

Conceptually this is said the self organizing growth mechanism of the graph

leads to paralogical distribution And mathematically, just, turns out that

sampling, by density leads to a difference equation.

We'll see that in advanced material whose solution gives you equilibrium that

satisfies a power law. In contrast,

Constraint optimization generates power law distribution in very different ways,

so that the graph topology is really designed with some objective functions,

and some constraints in mind. And yet, resulting topology can also show

power law. Conceptually it said that power law is the

natural outcome of constraint optimization.

With objective driven by economics, for example in constraints driven by

technology, mathematically says that either entropy maximization, or what's

called iso-elastic utility maximization and the linear constraints.

We're going to see these in the advanced material part of the reading for this

lecture. Either of them, among many others would

give rise to a power law. So there lies, the main differences

between preferred attachment, and constraint optimization.

So the question is which one to use? Well, it depends on which one gives you

the predictive power that you need on other properties.

Not the power law distribution anymore, because they both generate that.

Other properties of the graph, such as robustness to a target attack of highly

connected nodes. Would that break the network?

Would that break just the edge? Whichever, generate a model can also have

a predict on other properties that you care about, should be the one that you

use. And there lies an interesting story on the

pitfall of generated models. In 2000 There were so much talk about the

internet having an Achilles heel. It turns out that if you look at empirical

data, you go talk to At and t or Cisco-Juniper you'll realize that, that's

just now supported by factual data. It highlights the importance of

fortification of any self consistent theory through empirical data of the field

that you are talking about. It also highlights the risk of over

generalizing something called a network science, which is a fuzzy and mostly, a

meaning less term, unless you provide some concrete meaning to it By overgeneralizing

it to some universally true property it actually loses domain-specific

functionality models. And thereby often loses predictive power

and even lead to confusing misleading conclusions.

Such as the non-existence of this Achilles heels So we have seen generative models.

Reverse engineering of network apology, small whirls and scale free, and later

we'll also look at reverse engineering of network protocols.

We have also seen. The interplay between topology, say a

graph and the corresponding matrices, versus functionalities.

Whether there's navigation or search. Routing or robustness to a specific kind

of attacks. And, the interplay between these two

dimensions of what we call a network whether socio-economic or tech network

will continue to show up in the rest of the course.

As we conclude this point. The midpoint of this twenty questions

about our net (for) life.