0:00

Welcome to lecture ten of Networks, Friends, Money and Bytes.

In today's lecture, we'll try to formulate and answer this question.

Does the internet have an Achilles' heel? And of course the term Achilles' heel here

comes from the metaphor and we need to properly define what exactly do we mean an

Achilles' heel. Now, there's no shortage of vulnerability

of the Internet. In the second half of this course, we'll

be focusing a lot on the underlying architectural principles behind the

Internet. And we'll see that there are many holes

for security breach, for privacy intrusion, and also for unreliability.

0:50

However, a Achilles' heel here refers to a reported existence of a particular type of

Internet security issue back in 2000. And this specific vulnerability has the

following, Issues.

There is a highly connected router in the middle of the network can be easily

detected and if you destruct this router then the whole network will split into

many parts. Later in chapter thirteen, we'll talk a

lot more about routing and routers on the internet.

But for today, we can just say that in the Internet, there are many nodes and leads

of various kinds amd a lot of these nodes are what's called routers,

They route the packets across the Internet.

What this is saying is that, in the middle of the Internet, there're some very highly

connected routers. And if you destruct them because they sit in the middle of the

internet, where the many connections to the rest of the network.

If you take this one down, the entire network will break apart in many

distinctive pieces. It turns out that, if we define this kind

of specific vulnerability as the Achilles' heel,

2:19

Then Internet does not have an Achilles' heel.

And yet, back in 2000, it was reported that there is an Achiles' heel.

It was on the cover of Nature Magazine, and was a very alerting, report,

Except, it does not fit the reality of the Internet.

So why would there be such a report? And what can we learn from the way to

refute this wrong statement? Well, before we start answering that

question, we have to highlight that topology is only part of the story.

The story here is about a specific kind of robustness,

3:11

Or the network called the Internet. And as we have talked about, in networks, we have

to think about both the topology, Often represented as a graph with various

matrices associated with it, as well as the functionalities, which are captured by

a variety of mathematical models. So there are many functionalities,

For example, network protection. There are many network protocols, that

we'll talk about later in the course that can detect failure and then try to recover

from the failures. And these are functionalities sit on top

of the underlying topology, And therefore it's not just about what the

graph looks like. It's, it's also about the control plane

that sends a control signal to measure, detect, and recover from a variety of

failures, including router failures. On the flip side, often because of the

bugs or the time to converge in the network management plane that caused a lot

of failure issues. So both robustness and protection and

failures actually stem off and from the functionalities.

Robustness from the good things about functionality filler from the bugs and bad

design in the functionalities, rather than associate it with the underlying graphs.

Now having said that, let's assume actually incorrectly that, robustness is

about the graph only, for the rest of today's lecture, and let's say what we can

say there. Now, since we are focusing just on the

graphs today, I have to talk about three different kinds of graphs of the Internet.

One is the graph of webpages connected by hyperlinks, just like what we saw in

lecture two, or three, I should say with Google's PageRank.

Algorithm in ranking webpages based on this connectivity pattern.

Let's graph type number one. Graph type number two is graphs of the so

called autonomous systems, And.

5:35

These are business entities, For example, AT&T, British Telecom, Reliance in India,

and so on. Okay? Sometimes they are smaller entities,

smaller companies, or even campuses, for example, Princeton University.

These are different entities, profit or non-profit organizations, and they each

control a part of the overall network of networks that we call the Internet.

These individual autonomous systems, large and small ones, are connected by physical

as well as business relationships. That's the second type of graph.

The third type of graph, which is the focus today, is the graph of routers

connected by physical, often fiber, links. So if I just give you a picture of a

graph, This could be a graph where the nodes are

webpages and links are hyperlinks. It could be a graph where the nodes are

companies and organizations and the links are business relationships.

It could also be graphs where the nodes are actual physical routers and switches

and the links are physical fiber or copper wireless links.

When we talk about the robustness with respect to attack on hardware of the

Internet, we are talking about this third graph, the graph of routers.

7:21

Discovering the graph topology itself it is a very challenging task.

In fact for the Internet, no one has a precise map of the Internet.

You may think somewhere in the world that there is a place that stores the map of

the Internet. Actually, there is no such map.

If we're talking about the AS, autonomous system relation graph, many links are

missing. These business or physical links are not

reported or recorded. There also whats called Internet exchange

points. For example, thousands of these just in

Europe. As pairing shortcuts, they decide that

instead of going through some bigger autonomous system, they will form a short

cut themselves, say between two smaller and medium-sized autonomous systems and

often these are not reported. So, it's widely reported that it could be

more than half of the links actually missing and therefore we are not even

close to having precise view of the AS graph.

But what about the router level graph with physical links that we are focusing on

today? Often, the topology is estimated or

inferred based on some measurements. These are limited measurements called say

traceroutes. But it is also widely reported that because of the protocol

details, traceroutes often lead to bias the sampling.

In other words, in a huge network, , with many nodes and many links, a lot of links

are not properly recorded by this measurement methodology.

And in fact, near the edge of the network, for example, near your, corporation, near

on a campus, or near your home, High-rise buildings as opposed to in the

backbone of the network. There's often no scalable measurement

platforms. So at the router level, the graph is also

not known for sure. There are many missing links.

But let's assume again incorrectly, that we can measure the degrees accurately and

see what we can say. So let's skip the important fact that

robustness is more than just about a graph,

And skip the fact that these graphs, we do not yet have a good view.

And just go straight to say, suppose we believe that we can measure the degrees

and that's all that matters. The topology is all that matters.

Let's see what we can say. Alright.

So what is the degree distribution we observe for router level graphs?

Well, we can tabulate these degrees into a histogram.

And we can, therefore construct the probability distribution.

Let's say probability that a node's degree x is assuming a certain specific value

small x. And often, people look at the tail distribution, which is the

probability that the node degree x is bigger than or equal to a specific value

small x. It turns out that,

If you look at the either AS or router level, although the router level graph is

as well focusing right now. The tail distribution roughly follows x to

the power of minus alpha with sum constant k in front of it.

This alpha here is a constant parameter that describes that the decay of the tail

distribution and this approximate symbol basically says that for sufficiently large

12:14

As oppose to homeogenious networks, So homogeneous in what and scale-free in

what sense? Well, it turns out that a lot of

distributions, such as Gausian distribution or normal that we are so

familiar with, or exponential distribution, their tail distribution

decays much faster than this x to the power minus alpha.

And therefore, they do define a specific scale,

For example, for Gausian distribution, The scale is determined basically by the

mean and the standard deviation. And, it is not easy to get a node with a

degree [INAUDIBLE] degree that is way off from many standard deviation from the

mean. So that's why we call that homogeneous

networks, Whereas for long tail distribution, the

tail drops much more slowly, And for Gaussian, this would be basically

zero, But for long tail, this is still, still

signifigantly above zero. In this sense, it's called inhomogeneous

cuz there can be some nodes with a very high degree compared to the mean.

It's also called scale free, Because if you zoom into the tail, you see

that it also behaves like a heavy tail. So there's no characteristic scale.

No characteristic scale of, the node degrees.

So here is a cartoon to illustrate, this is Gaussian and this is long tail.

14:07

Okay, there is a scale with respect to Gaussian, there is no scale, the scale can

be arbitrary. It is very difficult to be a way off from

the mean, whereas this can be way off in a heavy tail, in a long tail distribution.

Now, what kind of distribution is long tail?

A specific special case is called a Pareto distribution,

The same Pareto as Pareto optimal that we were talking about in lecture one.

So, Pareto distribution has the following tail,

Which is the probability that our random variable x, say the node degree, is bigger

or equal to a specific small x equals x over k, for some constant k to the power

of minus alpha. That we can take the differentiation of

one minus this tail, which is the cumulative distribution, differentiate

this expression that will lead us to the probability distribution function, PDF,

for Pareto distribution, that says the probability that a Pareto distributed

random variable equals, sorry, equals x is alpha, k to the power alpha, just a

constant, times x to the power one plus alpha.

Okay? As you can see, both the tail distribution and the distribution itself

follows a long tail, X to the power minus something.

Okay? Either minus alpha or minus one plus alpha with some constants in front of it.

Now, in contrast, for a Gaussian, rather than a Pareto distribution, that PDF, is

as you may recall, one over square root two pi, standard deviation of sigma times

exponential minus x minus the mean mu2 squared over two sigma2.

Squared. Hm, okay? So for Pareto distribution, it follows

this. For Gaussian it follows this. We can easily convince ourselves

numerically that, if you plug some x in here, you will see that this is way

smaller than this, as x deviates much away from the mean. Okay?

This is exponential minus x squared, this is only x to the some power, one plus

alpha, negative. And a visualization for the Pareto distribution is a straight line

in the log-log curve.