Hello, today we're going to talk about the small world phenomenon, which suggests that the world is small in the sense that, we're all connected by very short paths between each other. But how can we measure these paths, and how short are they really? So in the 1960s, a researcher named Stanley Milgram was wondering about this very question. And he set out to create and experiment that would allow him to measure how long these paths are, and whether or not they really exist. And so the set up of the experiment was the following. He chose 296 people at random as starters, all over the US, and asked them to forward a letter to a target person. This target person was a broker in Boston. And the instructions for these starters was that, they were to send the letter to the target if they knew him on a first name basis. Of course, since these were randomly selected people, most of them didn't know this person on a first name basis. So if they didn't know him, their job was to send a letter to someone else along with instructions, who they knew on a first name basis, was more likely to know the target on a first name basis. And the instructions contained some information about the target like the city where he lived, and his occupation. So then, what they did is, they checked how many letters actually arrived at the destination and how many hops they took to arrive there. And so, these were the results. Out of all 296 starters, 64 of them reached the target. So the median chain length, the number of hops that it took for these letters to get there was 6, which was consistent with this phrase of six degrees of separation. Here in this plot, we can see a histogram of the chain length for the letters that actually arrived, and we can see that they took anywhere from 1 to 10 hops to get there, but the median was 6. And so the key points here are the following. So first of all, a relatively large percentage of these letters actually arrived. If you think about the fact that these were kind of randomly selected, and that people could drop out of this and say I'm not even going to try to forward this letter to anyone. And so despite all of these things, more than 20% of the letters actually reached the target. And the second is that for the ones that reached, these paths were relatively short, right. So median of 6, a single digit, in a network of millions and millions of people, that seems pretty small. The other thing that's interesting, although we're not going to focus much in this part of it, is that people actually are able to find this short paths. So not only do these paths exist, but somehow people are able to find them, which is also got interesting. Now more recently, people have tried to answer this question but without actually running an experiment, instead looking at actual data in measuring how long these paths are. Of course in 1960s, we do not have access to very large data sets of networks but nowadays, we do. So one example of this is, looking at instant message communication among people. So researchers took a network of 240 million active users on Microsoft Instant Messenger. And then they connected them if two users were engaged in a two-way communication over a period of a month. And so these defined a network, a very large network. And the estimated path length in this network, so if you take any two people at random and check what their distance between them in the network, so the shorter the length of the shortest path, the medium is estimated to be 7. Which is very close to what Milgram had found in the 1960s which was 6, and it's also very small. Here is the histogram of the distances of this particular network. We again see that these tend to be small. People have also tried to do this on using Facebook data. And so what they did is, they looked at how the distances change over time, and how they vary if you try to measure them on the full Facebook network versus just a subset of it in a particular region, like in the United States. And what they found is that, if you take the global network, the average path length in 2008 was about 5.28, and three years later in 2011, it was 4.74. So it seems like these path lengths are short and they seem to be getting smaller over time. And as you may expect, if you take a smaller region like United States, then this tend to be even smaller. So what this tells us is that, if we look at these real networks that have millions and millions of people, the average shortest path between pairs of notes tend to be on average or median, tends to be very small. So that's the first property that I wanted to discuss today. The other property that I wanted to talk about is this clustering coefficient property, which we've talked about before. So remember the local clustering coefficient of a node is the fraction of pairs of the nodes friends who are friends with each other. It's roughly talks about, are there lots of triangles are not? And when we take the average local clustering coefficient of all the nodes. And when we look at real networks, like for example Facebook in 2011, we find that the average clustering coefficient tends to be pretty high. So here is a plot that has on the x axis the degree of a node, and then on the y axis we have the average clustering coefficient. And we find that, it decreases as degree increases. So for people that have lots and lots of friends, their clustering coefficient tends to be smaller. But on average, it's still pretty large. So if you look at nodes that have say, 20 to 50 friends, they're clustering coefficient is somewhere in the 30s or so. In the Microsoft Instant Message network, the average clustering coefficient was 0.13. And in the actor network that we talked about before, the average clustering coefficient is even higher at 0.78. And so the thing to note here is that, I say that these clustering coefficients are high because, if you imagine that these graphs were completely random. Meaning, you take all of these nodes and whatever number of edges it has and you kind of just assign them at random, then you would find that the average clustering coefficient would be much, much smaller because there is nothing that's sort of making these triangles appear, right? And so, these clustering coefficients tend to be high. We think that there is something happening in the formation of the network that is sort of favoring triangle formation. So we've noticed two things. One, that social networks tend to have a high clustering coefficient and two, they tend to have small average path length. And so the next question is, can we think of a model that generates networks that have these two properties. Just like we did for the power law distribution, we observe that the networks tended to have these power law distributions. And then we covered the model that gives rise to these types of properties. And that allowed us to think about, well, maybe this model explains how these properties come about in the real data. Can we do that same kind of thing here? Well the first natural thing to do would be to check whether we already have a model of network information, preferential attachment model. Can we check if this particular model has networks with high clustering coefficient and small average path length? So let's create one of a 1,000 nodes and parameter m of 4, that means that each new node is going to attach to 4 existing nodes. And let's see what its average clustering is. Well in this case, it's 0.02 which is pretty small compared to the networks that we had seen before. Now for the average shortest path length, it is 4.16, which is pretty small, it's pretty decent. So it seems that it get's one of the properties but not the other. Let's see what happens if we vary the number of nodes and the number of edges per new node perimeter m. And let's see what happens to the path link in the clustering. On the x axis, I have the number of nodes. And on the y axis, I have the average clustering coefficient. And we have different curves that represent different values of the perimeter m. And then, I have the same thing for the average shortest path. Well we serve as these networks have a small average shortest path, and the reason why is that, if you remember, these power law degree distributions have the property that some nodes have a very, very high degree. And so these high degree nodes act as hubs that connect many pairs of nodes, and make sort of bridges between them. So this is why we see that these average shortest paths tend to be kind of small. However, for the clustering side, we see that as the number of nodes increases, the average clustering coefficient also decreases and it becomes very small. And so even at 2,000 nodes, we see that the average clustering coefficient is very small, at like 0.05 or so. And we have seen networks of millions and millions of nodes that had a much larger clustering coefficient. And so it seems like the preferential attachment model, while it has the small average shortest path property, fails to have these cluster and coefficient, high cluster and coefficient property. And the reason is that, there is no mechanism in the preferential attachment model that would favor triangle formation. So it's natural that it just doesn't have that property. So now let's talk about a new model called the small world model that actually gives network that have these two properties. So first, let's discuss how the model actually works. So you start with the ring of n nodes, so basically you put n nodes in a circle, and then you have each node connected to its k nearest neighbors. So there's some parameter k to fix. And then you have fixed another parameter p between 0 and 1, and what you're going to do is, we're going to take each one of these edges that we created in the first step and with probability p, we're going to rewire it. So let's say we're taking the edge u,v with probability p, we're going to find another node w, completely at random. Or we're going to rewire it so that the edge u,v becomes u,w. You do this for each one of the edges and you're done. So let's walk through an example here. We have 12 nodes, and in the example, k will be 2, so each node is connected to its 2 nearest neighbors and p will be 0.4. I will say that these parameters k = 2 and p = 0.4 are not the typical parameters you would use for this particular model. Typically, you have a k that's much larger, and a p that's much smaller but just for the purposes of the illustration I am showing you here, I'm going to choose these parameters. So what we have to do is, we're going to have to go through every edge and decide whether we're going to rewire it or not. And we're going to rewire it with probability 0.4. And so let's stat with the edge LA and we have to decide whether to rewire or not. And let's say we use some type of random number generator and we decide that we don't rewire it, okay. And then we go to the next one and this one, we won't rewire either. We go to the next one and this one we will rewire. So now we have to pick another node at random, and in this case, we would pick G. And so this edge instead of connecting K and J, is going to connect K and G. Okay, we go to the next edge, GI and this one we're not going to rewire. The next edge, we will rewire, so we pick a node at random and rewire it. Go to the next edge, we don't rewire it. Go to the next edge, we don't rewire it. Go to the next one, and we rewire it. It becomes FJ. Go to the next one no rewire, go to the next one, rewire so it becomes the DH. Go to the next one, we rewire it and go to the next one we don''t rewire it, and we're done. So this is the network that these model produces, or at least one of the possible instances of of these networks that can be produced with these parameters. So now let's think about, what happens to this network as we sort of change this parameter p, which ends up being the kind of most interesting parameter of the network? Well here's an illustration from the original paper that came up with this model. Let's think about first what happens in the extreme cases. So when p is 0, so we're looking at this type of network. What we have is a completely regular lattice. So there is no rewiring, and because every node is connected to k of its neighbors, then there are lots of triangles that get formed locally, right? Because, well it depends on the value of k, but if k is sort of large enough, then you start to form many triangles. And so this network will have pretty high clustering coefficient because it purposely going to form triangles in the beginning. And then nothing gets rewire, nothing gets changed, so it has a pretty high clustering coefficient. However, if you imagine there's been a very large network, you can imagine that the distances between nodes can get very large, right. So if you're in one side of the ring to get to the other side of the ring, you can hop a little bit but there is no long bridge that can take you there. And so, we expect that the distances would be long. Now let's think at the other extreme where we're going to rewire every single one of the edges, and so that would be this network. And so what's happening here is that, we're going to rewire every single edge. And so this network is now completely random. So we've created a bunch of long bridges. And so presumably, distances are pretty small between different nodes. But now, we kind of destroyed the sort of local structure that we had before. And so now we probably don't have many triangles there. And so while the distances are small, now the clustering also got very small. If you're In between, if p is between 0 and 1, then what you have is that some edges get rewire, so you create some long bridges. And so the distances between nodes, the average distance gets reduced. But the local structure depending on p can be maintained. So you can maintain that high clustering as well. So there's a sweet spot for the parameter p, where you can maintain both the short distances as well as the high clustering, and so let's see that next. So what is the average clustering coefficient and shortest path of a small world network? Well it depends on the parameters k and p. So here, I'm showing two plots where on the x axis we have the value of p, and on the y axis, we have the average shortest path and the average clustering. And we have curves for networks that have different parameters, k. And so what we see is that, as p increases from 0 to 0.1, notice here that we don't get anywhere close to p = 1. So, this is staying with very small values of p. What we see happening is that, the average shortest path decrease rapidly right after sort of p is away from 0, it just drops down. Whereas, the average clustering coefficient while it also decreases as p increases, it decreases much slower. So for example, an instance of a network with 1,000 nodes and k = 6 and p = 0.04 has the following values. It has a value of 8.99 average shortest path, and 0.53 average clustering coefficient. So for these types of values of p, we can achieve both of the properties that we wanted. The average shortest path being small, single digit, and the average clustering coefficient being pretty large. In NetworkX, you can use the function watts_strogatz_graph. Again named after the researchers who came up with this model, and with parameters n, k, and p. And it returns a small world network with n nodes that starts with the ring lattice, where each node is connected to its k nearest neighbors, and it has rewiring probability p. Now let's wonder about the degree distribution of the small world network. What does it look like? Well, let's use the same code that we used before to visualize the degree distributions of network. This time using a small world network with 1,000 nodes, k = 6, and p = 0.04. What we is that, it looks like this. So most nodes have degree 6. A few of them have 5 and 7, and I think maybe 1 or various small number of them has degree 4 and 8, and that's about it. And this makes sense because, well, the rewiring probabilities is very small so most of the edges aren't going to be rewired. So most of the nodes are going to stay with their degree of 6 that they had in the beginning when the ring was first formed. And because there's no mechanism that sort of makes some nodes accumulate a very large degree, then none of them do. And so, while this model gets the two properties that we discussed, the clustering coefficient and the shortest paths, it doesn't get the power level degree distribution that we also observe in real networks. And the preferential attachment model gets correctly. There are certain variants of the small world network in NetworkX that you can use. So the first one addresses and issue that small world networks, in general, can become disconnected. So there's nothing that makes them so that they necessarily are in a single component. If you rewire too many of the edges, you could end up with more than one connected component. And this is sometimes something we don't want. Sometimes we want to create a network that is connected, where every node can reach every other node. And so if that's what we need, then we can use the function connected watts_strogatz_graph, which has the same parameters except it has an additional parameter t. And what it does is, it runs the standard watts_strogatz_graph model up to t times until it returns a connected small world network. So it kind of keeps trying different iterations of the model until it gets one that's connected. Unless it runs out of tries, which if it tries more than t times, then it returns some type of error. There's also the newman_watts_strogatz_graph, which is very similar to its small world model. But instead of rewiring the edges, when it's time to rewire an edge, it actually adds a new edge and leaves the other one still in the network, and still does this with probability p. So instead of rewiring, it adds additional edges. In summary, today we looked at real networks and we found that they appear to have a small shortest path. And that they tend to have high clustering coefficient. And we found that this preferential attachment model that we talked about last time produces that works that have small shortest paths, but they also have a very small in clustering coefficient. So then we talked about a new model, the small world a model, that starts with a ring lattice with nodes connected to the k nearest neighbors, so it starts out a very high clustering. But then it rewires some of the edges with probability p. And we found that for some values of p, namely for very small values of p, this small world networks have a small average shortest path. Because new long edges are added to that network and those create bridges for nodes to reach each other. And they still maintain these high clustering coefficient, which matches what we observe in the real networks that we look at. However, the degree distribution of a small world network is not a power law, as we had also seen in real data. And on NetworkX, you can use the function watts_strogatz_graph or some of the other variants that we covered to produce small world networks. And that's all for today, I hope to see you next time.