Now to compute the closeness centrality of nodes one or two for this graph, we have to first write down the shortest distance from node one to all the other nodes, from node two to all the other nodes. And then we can write this down as d from one to two is one, d from one to three is three, d from one to three is one, d from one to five is two. To six is two, to seven is two, to eight is two. Okay. And therefore c1 is seven, that's n minus one over one plus one plus one plus two, four times and that turns out to be 0.6364. And you can do the similar calculation for c2, and that turns out to be 0.5. So, indeed node one is again, more central, or more important by closeness metric than node two, but again the difference is not that big. What about between this centrality? Well, we canwrite down the shorthand notation matrix G, where the ij entry of this matrix G is simply gij. That is the number of shortest path between node pairs, node pair i and j. Then we can write down all the n, Ij for node one. That is number of shortest paths that node one lies on between nodes i and j. We than can write down n two, ij for all the ij's. We can collect all these into three matrices, okay, or whatever data structure you would like to use. And then we can apply the simple definition that for node one is between the centrality is simply the summation of all the node pairs, of the ratio. How many shortest path is node one lying on between nodes st versus the total number of shortest path between s and t? Likely we can, likewise, we can turn this one into two, and turn this one to two and compute it for B2. It turns out that b1 in this case is twelve, Whereas b2 is only 2.5.. Now this, is much bigger than b2. So node 1's importance by between this centrality, okay, Is almost five times as big as node 2's, Which is not too surprising given that we know node one is important in the sense that it glues the two parts of the network together. And if you look at the Medici family in the Florence family graph, you see that, while, degree is six as opposed to the second largest four, is between the centrality is about 50, which is about five times as much as ten which is the about the number of between this centrality measure for the second highest family. So, between the centrality, as you can see, is particularly helpful if you want to quantify the notion that nodes who are in between a lot of node pairs on the shortest path. Then you want to use this metric. Now, what about links importance? So far, it's all about nodes importance? Well, first of all what is, what is a link? It's not as easy to define, For example on Facebook. I may have 500 friends but only have communicated with 200 of them. And maybe mutual wall posting, other than happy birthday, only with twenty of them. So, do I have a link with twenty. Okay. 200 or 500 people? A link would also be bi-directional or unidirectional. For example on Twitter, as we mentioned, a link could certainly be, unidirectional. Some of the these links are strong, Some of these links are weak. So we mentioned that if you define links as weak links if they connect nodes that are not otherwise very well connected. And actually, they may be strong in the sense of disseminating information. We say for link, Is between two nodes that does have many short path among them, other than this direction link, and we say this is a locally important link. And if this link is connecting two important nodes in a local important way, we can call that a global important link. So in fact. We can also quantify our similar metrics of links importance as node importance. In, in particular, we can look at link between. So, Instead of counting how many shortest path of node pair st does node i reside on, can look at, on link j. How many shortest path between node pair st does a particular link ij sits on top? Then we can take the sum or average of that proportion and call that the betweenness importance of length ij. Alright. Now, equipped with a basic discussion of importance scores, we can move on to two more influence models with network topology as part of the picture. Just want to caution, however, that we have assumed the graph is given and static. In practice, it's often difficult to get an accurate topology, and topology is often time-varying, too, So we're skipping those important practical issues. Now, The third, now influence model, Counting last lectures too is what we call contagion. But some people call this infection model but we use the word infection for a different model later in this lecture. A contagion is very similar to the tipping idea that we discussed last time. You may recall that the adoption percentage p, and influence function f of p. And, we look at, at the intersection of the influence functions f of p's curve with the 45 degree line, Which defines equilibria of the adoption percentage in the market and we define a stable and unstable equilibrium. Back then, we basically said that the adoption percentage next the discrete time slot is a function of the current one and this function captures the macroscopic effect of each individual's decision by looking at a global adoption percentage. Contagion, says, instead of global percentage, let's look at local percentage. It says that a node, Will only flip, for example from a non-iPad user to an iPad user. If and only if p of, is neighbors flipped. So rather than looking at the percentage of the entire network, I say, let me just look at my neighbors, those that I have a direct one-hub connection. Okay? Let's assume these are bi-directional links and there are six of them. If three or more of them have adopted iPad, then I will also adopt iPad. This is a local density-driven decision making. Now each node may have a different threshold of the percentage of local neighbor it needs to see adopting before she decide to adopt. So this vector P here, okay, May differ from each node to each node. Now, there are many questions we can ask about this. Since it's not about a whole network, global adoption percentage, We have to ask will the whole network actually flip? Okay, will this trigger a cascade, not information cascade but an adoption decision cascade to span across the whole network. Starting from certain seeds. Okay? Initial early adopters of a given product, service, or music, or fashion industry. Then we ask how long would that take? If it ever flips the whole network, this take,. If there are N nodes, N rounds, less than N rounds. And, if it does not flip the whole network, we wonder will part of the network flip? And if so, how big is that? And which part of the network would that be? These are all analysis questions, then finally we can think what about a design question. How do you seed to optimize flipping, for example, maximize the degree of flipping. How many nodes in a network flip? Or, minimize the time it takes to flip that many nodes. It turns out that these four questions, they are all, important, But only the first one is relatively tractable. The other three questions get quite difficult. In fact, this one, design optimization question, gets very difficult. Now, before we can talk about, the idea of, contagion and answer at least the first question which is raised. The first one to talk about a notion in a graph called a cluster. A cluster is really a subset of nodes, with a certain density, Let's say, density p. A clause of density is a set of nodes, such that each of these nodes has at least a fraction of p of its neighbors also in decent. Okay. So, for example, in this graph. Let's see, There are five nodes. Look at this subset of nodes, three out of five nodes. Okay? The lower left triangle. If we view this as a cluster, the question is, what is its density? Okay? Well, this node got four neighbors. Two of them are in this cluster. In this subset of nodes we're talking about. This got two nodes, so it's two out of four. This one got two nodes, both of them are in the cluster. This got three nodes, two of them are in the cluster. What is the smallest of the three number that matters? So, one two-thirds, or one-half? One-half is the smallest of the three and therefore we say, this forms a cluster of density half. There's at least half of the neighbours, where any of these three nodes in the cluster resides also in the cluster. Or what about this cluster, Four nodes. It's a different subset of nodes. And let's wonder let's calculate what is the density of this cluster. Well now, This node got three, three of four links pointing to nodes also in the cluster. This is still two out of two, and this one is now three out of three, and this one is two out of three. So, one, one, two over three, and three over four. The smallest is two over three, so this is a cluster of four nodes with a density of two over three. Now, what's the use of defining density of a cluster? Well, because of that answers, the first question, The whole network flips under the contagion model if and only if, So to prove both necessary and sufficient. There is no cluster of density one minus p or higher in the set of non-flipped nodes. Okay. So let's parse this sentence again. The answer says that, look at the set of non-flipped nodes, Okay, you want them to be flipped or you want them whether they will or not. Well, if in the set of non-flipped nodes, there is a cluster with density one minus p or higher, then you cannot possibly have the entire network flipped. The other way around is also true if the entire network does not, flips, then that means there can not be a cluster of density one minus p or higher among those who have not yet, flipped. So how do we prove this theorem? The answer to the first question here, okay? But without going through the complete vigorous proof, it's easy to see at, least one of the directions, the only if direction. Okay The only direction basically says the following. If you have a cluster of density one minus p or higher all in the state of not flipped, Then this cluster with such a density will become a, blocking cluster to prevent the, flipping to completely happen inside the network. Because they have enough inertia among themselves, This cluster, This subset, no matter what the rest of look, network look like, the density is one minus p or higher, okay? So, one minus p or more of them will always point to themselves, Which means, you only have less than p of their neighbors being outside of them. So even if the rest of network all flips, They will still say, look at each other and say, well, We do not yet have, a high enough, density of flipped nodes as our neighbors so we will still not flip. So at least the only if part of this theorem is easy to see. Now then, the questions two and three actually do not have easy tractable answers for a general topology of the graph. But question four is a very interesting one, even though it doesn't have a good answer yet, I just want to highlight some of the issues involved here. And question four talks about seeding of contagion, seeding or flipping, and trying to trigger that. Let's say that you have fixed a budget of, you know, $1000. You want to stage a viral campaign to sell a certain new product by word of mouth online.. Okay, so you want to seed or buy off very important nodes, could be people whose tweets are watched and retweeted by many other people. And I say, these nodes they command a certain price to be seeded to help you. So some may say you have to pay me ten bucks, some say pay me five bucks. Let's assume these are prizes are also increasing functions of the influential power as perceived by you or themselves. So that fits the budget, you cannot buy off everyone. So buy off a small number of them and the question is, which ones to buy off, even without considering the price to pay them, just consider each as one unit of dollar. Okay? Is to very difficult optimization problem, much more difficult what we saw early on. So graph their addict or combinatorial optimization. Towards the end of the course we'll, say a few more words about that, but right now, we just want to highlight some intuitions. The first intuition that is a tradeoff.. The tradeoff is between seeding nodes so that they are concentrated enough, so that they can have a chance of flipping, at least some cluster, versus seeding nodes so that they are far apart and therefore they may be able to flip the entire network. You want to spread them out so that they can read different corners and yet, we don't want them spread out so much that they lose the chance of flipping an even small corner of the network. The second one is that it's not necessarily successfully refinable. What does this means? Here's a small example. Okay/ Suppose we want to, seed a contagion in this graph here. And we've got ten nodes where the sum links join here. Okay, And now we want to say which ones shall we seed? One answer is well, if I can only seed one node times zero. I shade it, dark, meaning that it has flipped from the not buying iPad to buying iPad, for example, okay? The switch flipped from, the state of, one state of the mind to the other state of the mind. And what is the que? Let's say if every other node has a flipping threshold of 0.49. Okay? So if 49 percent or higher of my neighbors flipped, then I would flip next as well. So then, what would happen? For this node, it's got two neighbors and one of the neighbors have flipped. So, at iteration one it will say I Will flip, so we put one over here. Then for this node two, okay, it says well at iteration one I see two nodes flipped. I got three neighbors, so two out of three are flipped, two over three is bigger than 0.49, and therefore next time slot I will flip two. We'll write down later, a number two over there. And you think this might propagate? Well, actually it doesn't propagate anymore, Because, this node says, I got three neighbors, only one flipped, that's less than 0.49. I got three neighbors, only one flipped, that's less than 0.49. So, these two nodes will not flip. And of course as you can obviously see, if they don't flip, none of the other five nodes will flip. So, this is the best you can do as it turns out. If you can only seed one node, seed this node and you can flip two more, as the equilibrium, flipping in the contagion. But suppose you can actually flip, you can seed two nodes, then a slightly bigger bucket. It turns out the optimum way to seed is the following, you seed, these two nodes, Even though they are not directly connected, But turns out that would be the best way. And the next iteration, this node sandwiched between the two will flip. This sandwiched between the two will flip. [inaudible] two, this node influenced by these two neighbors will flip, And then next time, these two nodes influenced by these two neighbors will flip. And these two nodes still will not flip. One way to view that is to say at least do, as it turns out. Now, the interesting point is that the optimal seeding strategy, when you can cede two nodes. Is to see these two no's, and contrast that with optimal seating strategy, which is, it turns out to be seeding this node. If you can only seed one, You see that is not successfully refinable. We cannot say, if you will allow me seed, seed one node, I will seed a particular node, and then when you allow me to seed one more node, I would just add one more node. Well, in the two seed optimal strategy neither seed is actually the optimal one you should seed when you only have once budget to seed one node. a smaller budget. And this highlights, yet another difficulty in solving the optimal ceiling problem. Now we have talked about contagion and now, let's move on to the fourth N+ model. Not counting the advancement, material sections of these videos. And the fourth one is called infection.