Welcome to lecture ten of Networks, Friends, Money and Bytes. In today's lecture, we'll try to formulate and answer this question. Does the internet have an Achilles' heel? And of course the term Achilles' heel here comes from the metaphor and we need to properly define what exactly do we mean an Achilles' heel. Now, there's no shortage of vulnerability of the Internet. In the second half of this course, we'll be focusing a lot on the underlying architectural principles behind the Internet. And we'll see that there are many holes for security breach, for privacy intrusion, and also for unreliability. However, a Achilles' heel here refers to a reported existence of a particular type of Internet security issue back in 2000. And this specific vulnerability has the following, Issues. There is a highly connected router in the middle of the network can be easily detected and if you destruct this router then the whole network will split into many parts. Later in chapter thirteen, we'll talk a lot more about routing and routers on the internet. But for today, we can just say that in the Internet, there are many nodes and leads of various kinds amd a lot of these nodes are what's called routers, They route the packets across the Internet. What this is saying is that, in the middle of the Internet, there're some very highly connected routers. And if you destruct them because they sit in the middle of the internet, where the many connections to the rest of the network. If you take this one down, the entire network will break apart in many distinctive pieces. It turns out that, if we define this kind of specific vulnerability as the Achilles' heel, Then Internet does not have an Achilles' heel. And yet, back in 2000, it was reported that there is an Achiles' heel. It was on the cover of Nature Magazine, and was a very alerting, report, Except, it does not fit the reality of the Internet. So why would there be such a report? And what can we learn from the way to refute this wrong statement? Well, before we start answering that question, we have to highlight that topology is only part of the story. The story here is about a specific kind of robustness, Or the network called the Internet. And as we have talked about, in networks, we have to think about both the topology, Often represented as a graph with various matrices associated with it, as well as the functionalities, which are captured by a variety of mathematical models. So there are many functionalities, For example, network protection. There are many network protocols, that we'll talk about later in the course that can detect failure and then try to recover from the failures. And these are functionalities sit on top of the underlying topology, And therefore it's not just about what the graph looks like. It's, it's also about the control plane that sends a control signal to measure, detect, and recover from a variety of failures, including router failures. On the flip side, often because of the bugs or the time to converge in the network management plane that caused a lot of failure issues. So both robustness and protection and failures actually stem off and from the functionalities. Robustness from the good things about functionality filler from the bugs and bad design in the functionalities, rather than associate it with the underlying graphs. Now having said that, let's assume actually incorrectly that, robustness is about the graph only, for the rest of today's lecture, and let's say what we can say there. Now, since we are focusing just on the graphs today, I have to talk about three different kinds of graphs of the Internet. One is the graph of webpages connected by hyperlinks, just like what we saw in lecture two, or three, I should say with Google's PageRank. Algorithm in ranking webpages based on this connectivity pattern. Let's graph type number one. Graph type number two is graphs of the so called autonomous systems, And. These are business entities, For example, AT&T, British Telecom, Reliance in India, and so on. Okay? Sometimes they are smaller entities, smaller companies, or even campuses, for example, Princeton University. These are different entities, profit or non-profit organizations, and they each control a part of the overall network of networks that we call the Internet. These individual autonomous systems, large and small ones, are connected by physical as well as business relationships. That's the second type of graph. The third type of graph, which is the focus today, is the graph of routers connected by physical, often fiber, links. So if I just give you a picture of a graph, This could be a graph where the nodes are webpages and links are hyperlinks. It could be a graph where the nodes are companies and organizations and the links are business relationships. It could also be graphs where the nodes are actual physical routers and switches and the links are physical fiber or copper wireless links. When we talk about the robustness with respect to attack on hardware of the Internet, we are talking about this third graph, the graph of routers. And we are going to talk about the degree distribution of these graphs. So I have to be able count the degrees of the nodes. You may think that's an easy job, but actually, Discovering the graph topology itself it is a very challenging task. In fact for the Internet, no one has a precise map of the Internet. You may think somewhere in the world that there is a place that stores the map of the Internet. Actually, there is no such map. If we're talking about the AS, autonomous system relation graph, many links are missing. These business or physical links are not reported or recorded. There also whats called Internet exchange points. For example, thousands of these just in Europe. As pairing shortcuts, they decide that instead of going through some bigger autonomous system, they will form a short cut themselves, say between two smaller and medium-sized autonomous systems and often these are not reported. So, it's widely reported that it could be more than half of the links actually missing and therefore we are not even close to having precise view of the AS graph. But what about the router level graph with physical links that we are focusing on today? Often, the topology is estimated or inferred based on some measurements. These are limited measurements called say traceroutes. But it is also widely reported that because of the protocol details, traceroutes often lead to bias the sampling. In other words, in a huge network, , with many nodes and many links, a lot of links are not properly recorded by this measurement methodology. And in fact, near the edge of the network, for example, near your, corporation, near on a campus, or near your home, High-rise buildings as opposed to in the backbone of the network. There's often no scalable measurement platforms. So at the router level, the graph is also not known for sure. There are many missing links. But let's assume again incorrectly, that we can measure the degrees accurately and see what we can say. So let's skip the important fact that robustness is more than just about a graph, And skip the fact that these graphs, we do not yet have a good view. And just go straight to say, suppose we believe that we can measure the degrees and that's all that matters. The topology is all that matters. Let's see what we can say. Alright. So what is the degree distribution we observe for router level graphs? Well, we can tabulate these degrees into a histogram. And we can, therefore construct the probability distribution. Let's say probability that a node's degree x is assuming a certain specific value small x. And often, people look at the tail distribution, which is the probability that the node degree x is bigger than or equal to a specific value small x. It turns out that, If you look at the either AS or router level, although the router level graph is as well focusing right now. The tail distribution roughly follows x to the power of minus alpha with sum constant k in front of it. This alpha here is a constant parameter that describes that the decay of the tail distribution and this approximate symbol basically says that for sufficiently large small x. In other words, initially the tail may not look like x to the power minus alpha, but when x small x is large enough the tail eventually will look like this. So what's so special about the tail of the degree distribution going down like x to the power minus alpha. Well, it turns out that these kind of distribution is so-called long tail distributions. And we will see why they are called long tails as opposed to these short tails. And, if in the graph, the node degree distribution follows long tail, then we got a corresponding network so-called scale-free network or inhomogeneous networks. As oppose to homeogenious networks, So homogeneous in what and scale-free in what sense? Well, it turns out that a lot of distributions, such as Gausian distribution or normal that we are so familiar with, or exponential distribution, their tail distribution decays much faster than this x to the power minus alpha. And therefore, they do define a specific scale, For example, for Gausian distribution, The scale is determined basically by the mean and the standard deviation. And, it is not easy to get a node with a degree [INAUDIBLE] degree that is way off from many standard deviation from the mean. So that's why we call that homogeneous networks, Whereas for long tail distribution, the tail drops much more slowly, And for Gaussian, this would be basically zero, But for long tail, this is still, still signifigantly above zero. In this sense, it's called inhomogeneous cuz there can be some nodes with a very high degree compared to the mean. It's also called scale free, Because if you zoom into the tail, you see that it also behaves like a heavy tail. So there's no characteristic scale. No characteristic scale of, the node degrees. So here is a cartoon to illustrate, this is Gaussian and this is long tail. Okay, there is a scale with respect to Gaussian, there is no scale, the scale can be arbitrary. It is very difficult to be a way off from the mean, whereas this can be way off in a heavy tail, in a long tail distribution. Now, what kind of distribution is long tail? A specific special case is called a Pareto distribution, The same Pareto as Pareto optimal that we were talking about in lecture one. So, Pareto distribution has the following tail, Which is the probability that our random variable x, say the node degree, is bigger or equal to a specific small x equals x over k, for some constant k to the power of minus alpha. That we can take the differentiation of one minus this tail, which is the cumulative distribution, differentiate this expression that will lead us to the probability distribution function, PDF, for Pareto distribution, that says the probability that a Pareto distributed random variable equals, sorry, equals x is alpha, k to the power alpha, just a constant, times x to the power one plus alpha. Okay? As you can see, both the tail distribution and the distribution itself follows a long tail, X to the power minus something. Okay? Either minus alpha or minus one plus alpha with some constants in front of it. Now, in contrast, for a Gaussian, rather than a Pareto distribution, that PDF, is as you may recall, one over square root two pi, standard deviation of sigma times exponential minus x minus the mean mu2 squared over two sigma2. Squared. Hm, okay? So for Pareto distribution, it follows this. For Gaussian it follows this. We can easily convince ourselves numerically that, if you plug some x in here, you will see that this is way smaller than this, as x deviates much away from the mean. Okay? This is exponential minus x squared, this is only x to the some power, one plus alpha, negative. And a visualization for the Pareto distribution is a straight line in the log-log curve.