Now to compute the closeness centrality of nodes 1 and 2 for this graph. We have to first write down the shortest distance from node 1 to all the other nodes. And from node 2 to all the other nodes. And then we can write this down as d from 1 to 2 = 1, d from 1 to 3 = 1, d from 1 to 4 = 1, d from 1 to 5 = 2. To 6 = 2, to 7 = 2, to 8 = 2. And therefore C1 = 7, that's n minus one, / 1+1+1+2+2+2+2. And that turns out to be 0.6364. And you can do this similar calculation for C2, and that turns out to be 0.5. So indeed, node 1 is, again, more central, or more important by closeness metric, than node 2. But again, the difference is not that big. What about betweenness centrality? Well, we can write down the shorthand notation matrix G. Where the ijth entry of this matrix G is simply gij. That is, the number of shortest path between node pair i and j. Then we can write down all the nij for node 1. That is, the number of shortest path that node 1 lies on between nodes i and j. We can then write down n2ij for all the ij's. We can collect all these into three matrices, or whatever data structure you would like to use. And then we can apply the simple definition that for node 1 is, betweenness centrality is simply the summation of all the node heirs of the ratio, How many shortest path is node 1 lying on between node st, versus the total number of shortest path between s and t? Likewise, we can turn this 1 into 2, and turn this 1 to 2, and compute it for B2. It turns out that B1 in this case is 12, whereas B2 is only 2.5. Now this is much bigger than B2. So node 1's importance by betweenness centrality is almost five times as big as node 2's. Which is not too surprising, given that we know node 1 is important in the sense that it glues two parts of the network together. And if you look at The Medici family in the Florence family graph. You see that, while the degree is 6, as opposed to the second largest, 4. Betweenness centrality is about 50, which is about 5 times as much as 10. Which is about the number betweenness centrality measure for the second-highest family. So betweenness centrality, as you can see, is particularly helpful if you want to quantify the notion that nodes who are in-between a lot of node pairs on the shortest path. Then you want to use this metric. Now what about link's importance? So far, it's all about node's importance. Well, first of all, what is a link? It's not as easy to define. For example, on Facebook. I may have 500 friends, but only have communicated with 200 of them. And maybe mutual wall posting, other than happy birthday, only with 20 of them. So do I have a link with 20, 200, or 500 people? A link could also bidirectional or unidirectional. For example, on Twitter, as we mentioned, a link certain to be unidirectional. Some of these links are strong. Some of these links are weak. As we mentioned that if you define links as weak links if the connect nodes that are not otherwise very well connected. Actually, they may be strong in the sense of disseminating information. And we say if a link is between two nodes that does not have many short path among them other than this direct link. Now we say this is a locally important link. And if this link is connecting two important nodes in a local important way, we can call that a global important link. So, in fact, we can also quantify similar metrics of link's importance as node importance. In particular, we can look at link betweenness. So instead of counting how many shortest path of node pair (s, t) does node i reside on. We can look at on link j how many shortest path between node pair (s, t) does a particular link (i, j) sits on top. Then we can take the sum or average of that proportion and call that the betweeness important of link (i, j). Now, equipped with a basic discussion of importance scores, we can move on to two more influence model with network topology as part of the picture. I just want to caution, however, that we have assumed the graph is given and static. In practice, it's often difficult to get an accurate topology. And topology is often time varying, too. So we're skipping those important, practical issues. Now, the third in our influence model, counting last lectures, too, is what we call contagion. Some people call this infection model, but we use the word infection for a different model later in this lecture. A contagion is very similar to the tipping idea that we discussed last time. You may recall that the adoption percentage, p, and influence function, f(p). And we look at the intersection of the influence function f(p)'s curve with the 45 degree line. Which defines equilibria of the adoption percentage in the market. And we defined a stable and unstable equilibrium. Back then, we basically said that the adoption percentage next to discreet time slot is a function of the current one. And this function captures the macroscopic effect of each individual's decision by looking at a global Adoption percentage. Contagion says, instead of global percentage let's look at local percentages. Says that, a note will only flip for example from a knife has user to it, an iPad user. If the known if p of is neighbors flipped. So, rather than looking at a percentage of the entire network. I say, let me just look at my neighbors those that I have a direct one-half connection. Let's assume these are bidirectional links. And there are six of them. If three or more of them have adopted iPad then I will also adopt iPad. This is a local density driven decision making. Now each node may have a different threshold of the percentage of local neighbor it needs to see adopting before she decides to adopt. So this factor p here, May differ from each node to each node. Now, there are many questions we can ask about this. Since it's not about a whole network, global adoption percentage, we have to ask, will the whole network actually flip? Will this trigger a cascade, not information-cascade, but an adoption-decision cascade? Just spread across whole network, starting from certain seeds. Okay, initial early adopters of a given product, service, or music or fashion industry. Then we ask how long will that take? If it ever flips the whole network, does that take If there are N nodes, N rounds, less than N rounds? And if it does not flip the whole network, will part of the network flip? And if so, how big is that? And which part of the network would that be? These are all analysis questions then finally we can say, what about design question? How do you seed to optimize flipping? For example, maximize the degree of flipping, how many nodes in the network will flip? Or minimize the time it takes to flip that many nodes. It turns out that these four questions, they're all important, but only the first one is relatively tractable. The other three questions get quite difficult. In fact this one, design optimization question, gets very difficult. Now before we can talk about the idea of contagion, and answer at least the first question which is raised. The first one to talk about a notion in a graph called a cluster. A cluster is really a sub set of nodes with a certain density, say density P. A class of density p is a set of nodes such that each of these nodes has a least a fraction of p of his neighbors also in the set. So for example, in this graph. Let's see, there are five nodes. Look at this subset of nodes. Three out of five nodes, okay? The lower left triangle. If we view this as a cluster, the question is, what is its density? Well, this node got four neighbors. Two of them are in this cluster, in the subset nodes we're talking about. This got two nodes, so it's two out of four. This one got two nodes, both of them are in the cluster. This got three nodes, two of them are in the cluster. It's the smallest of the three number that matters. So one, two third, or one half. One half is the smallest of the three. And therefore say this forms a cluster of density half. Is at least half of the neighbors for any of these three nodes in the cluster resides also in the cluster. What about this cluster four nodes? There's a difference sub set of nodes and let's wonder, let's calculate what is the density of this cluster? Well now, this node got three out of four links hunting two node also in the cluster, this is two, two out of two. This one is now three out of three. And this one is two out of three. So one, one, two over three, and three over four. The smallest is two over three, so this is a cluster of four nodes with a density of two over three. Now, what's the use of defining density of a cluster? Well, because that answers the first question. The whole network flips under the contagion model if and only if. So, I have to prove both necessary and sufficient. There's no cluster of density 1- p or higher in the set of non-flipped nodes. So, let's parse this sentence again. The answer says that, look at the set of non-flipped nodes. You want them to be flipped or you wonder whether they will or not. Well, if, in the set of non-flipped nodes, there is a cluster. With density 1-p or higher, then you cannot possibly have the entire network flip. The other way around is also true. If the entire network flips, then that means there cannot be a cluster of density 1-p or higher among those who have not yet flipped. So how do we prove this theorem, the answer to the first question here? Okay, without going through the complete vigorous proof, it's easy to see at least one of the directions, the only if direction. Direction basically says the following, if you have a cluster of density 1-P or higher, all in the state of not flipped. Then this cluster with such a density would become a blocking cluster to prevent the flipping to completely happen inside the network. Because they have enough inertia among themselves, this cluster, this subset, no matter what the rest of network looked like. Their density is 1- p or higher, okay? So 1- p or more of them will always point to themselves, which means you only have less than p of their neighbors being outside of them. So even if the rest of network all flips, they will still say, look at each other and say well, we do not have yet a high enough density of flipped nodes as our neighbors. So, we will still not flip. So, at least the only part of this theorem is easy to see. [SOUND] Now then, the questions two and three actually do not have. Easy tractable answers, for a general topology of the graph. The question four is a very interesting one, even though it doesn't have a good answer yet, I just want to highlight some of the issues involved here. In the question four, it talks about seeding a contagion, seeding a flipping. And trying to trigger that. Let's say that you have fixed a budget of $1,000, you want to stage a viral campaign, to sell certain new product by word of mouth online. So you want to seed or buy off very important knows, could be people whose tweets are watched and re-tweeted, by many other people. And I say these know, they command a certain prize to be seeded to help you. So some may say, you have to pay me 10 bucks and the same pay me five bucks. Let's assume these prices are also increasing functions, of the influential power as perceived by or themselves. So that fits the budget and you cannot buy off everyone. So buy off a small number of them, and the question is which ones to buy off? Even without considering the price to pay them, just consider each is one unit of a dollar. It's still a very difficult optimization problem, much more difficult than what we saw early on. It's a graph. Theoretic or combinatorial optimization. Towards the end of the course, we'll say a few more words about that. But right now, we just want to highlight some intuitions. The first intuition that is, there is a trade off. The trade off is between, seeding knows so that they are concentrated enough. So that they can have a chance of flipping at least some cluster. Versus seeding nodes so that they are far apart and therefore, they may be able to flip entire network. You want to spread them out so that they can reap different corners, and yet you don't want them spread out so much. That they lose the chance of flipping, an even small corner of the network. The second one is that, it's not necessarily successfully refineable, what does this means? It's a small example, it's supposed to want to seed a contagion in this graph here. And we've got ten nodes, with the sum links drawn here. Okay, and now we want to say, which one shall we seed. One answer is well, if I can only seed one node, times zero. I shade it dark meaning that it's flipped from, not buying iPad to buying iPad for example, switched. Flipped from one state of the mind, to the other state of the mind. And what is the P. Let's say if every other node, has a flipping threshold of 0.49. So if 49% or higher of my neighbors flipped, then I will flip next as well. So then what would happen, for this node. It's got two neighbors, and one of the neighbors flipped. So at iteration one, it would say, I will flip. So we put one over here. Then for this node two. It says well, at iteration one I see two nodes flipped. I've got three neighbors. So two out of three I flipped. Two over three's bigger than 0.49, and therefore next time slot two, I will flip two. We'll write down number two over there. And you think, this might propagate. Well, actually it doesn't propagate anymore, because this node says I got three neighbors. Only one flipped, is less than 0.49. I got three neighbors. Only one flipped, is less than 0.49. So these two nodes would not flip and of course, as you can obviously see, if they don't flip. None of the other five nodes would flip. So this is the best you can do, as it turns out. If you can only seed one node, seed this node, and you can flip two more as the equilibrium are flipping in the contagium. But suppose you can actually flip, you can seed two nodes, you got a slightly bigger budget. It turns out the optimal way to seed is the following. You seed these two nodes. Even though they are not directly connected, but turns out that that would be the best way. And the next iteration, this node, sandwiched between the two will flip, this sandwiched between the two will flip. Iteration two, this node, influenced by these two neighbors will flip. And then, next time these two neighbors, influenced by these two neighbors will flip. And these two nodes, are still will not flip. One way to view that, is to say three nodes form a cluster with a density, two over three. Which is bigger than one minus their threshold, which is 0.49 which means 0.51. Two over three is bigger than 0.51, that means this cluster will prevent a complete flipping. And this is the best that you can do, as it turns out. The interesting point is that the optimal seeding strategy, when you can seed two nodes is to see in these two nodes. In contrast that, with optimal seeding strategy which is turns out to be seeing this node, if you can only seed one. You see that it's not successfully refineable. You cannot say, if you allow me seed of seed one node, I will seed a particular node. And then when you allow them to seed one more node, I I will just add one more node. In the two seed optimal strategy, neither seed is actually the optimal one you should seed. But to seed one node, therefore you can not keep adding those to the optimal strategy, when you had a smaller budget. And this highlights yet another difficulty, in solving the optimal seeding problem. Now, we have talked about contagion and now, let's move onto the fourth influx model. Not counting the advancement material sections, of these videos. And the fourth one is called infection.