0:07

Hello. Today, we're going to talk about how to find important nodes in a network.

Â Recall these friendship network among 34 people in a karate club.

Â Based on the structure of this network,

Â which would you say are the five most important nodes in this network?

Â Of course, there are many ways of thinking about this question and we're going to start

Â thinking about different ways in which you could

Â imagine answering this particular question.

Â So, one way to answer the question would be to say,

Â well, nodes who have a very high degree,

Â nodes who have lots of friends are important nodes.

Â And if we use that definition then we'll find that

Â the five most important nodes are nodes 34,

Â 1, 33, 3 and 2.

Â These nodes right here.

Â There are other ways in which you can imagine answering this question.

Â Another way would be to say that nodes who are important

Â are nodes who are very close to other nodes and network,

Â nodes who have high proximity to other nodes and network.

Â And if we use that definition,

Â then the five most important nodes in the network would be notes 1,

Â 3, 34, 32 and 9.

Â So, instead of having node 33 we'll have node 9 and then instead of having

Â node 2 we'll have node 32 and all the other ones stay the same.

Â Yet, another way of thinking about importance would be to say that nodes who are

Â important are nodes who tend to connect other nodes into network.

Â And so, we could imagine measuring importance by

Â the fraction of shortest paths that pass through a particular node.

Â And if we do that,

Â if we define in that way,

Â we find that the five most important nodes in the network are nodes 1,34,

Â 33, 3 and 32.

Â So, instead of having node number 9,

Â we'll have node number 33 in

Â the top five and every all the other nodes will stay the same.

Â So, at this point,

Â these three definitions of importance are very informal and the goal of

Â this video and the following videos is going to be to get

Â a more precise definitions of how to measure importance in a network.

Â This general topic is called Network Centrality and these

Â centrality measures are the ones that allows us to

Â find the most important nodes in a particular network.

Â And the cases in which we want to use these techniques is when for example,

Â sometimes we're interested in finding who the influential nodes in a social network are,

Â which are the nodes that are very good at disseminating information to many other nodes,

Â or on the other hand,

Â which are the nodes that are good at preventing

Â sort of bad behaviors or epidemics from spreading on a social network?

Â These can be used in other networks as well,

Â so we can use a centrality measures to find

Â hubs in a transportation network or important pages on

Â the web or more generally these measure

Â allows us to find nodes that prevent the network from breaking up.

Â Nodes that if we were to remove them,

Â they would cause the network from they would cause the network

Â to fracture and break up into different components.

Â Here's a list of centrality measures that are commonly used.

Â We're going to be talking about some of these in this course and in this video,

Â we're going to be focusing on degree centrality and closeness centrality.

Â So, degree centrality makes the assumption that important nodes have many connections.

Â This is the most basic way in which you could define importance or centrality.

Â Simply, that nodes who have lots of neighbors,

Â lots of friends are very important,

Â and it makes sense right?

Â Of course, depending on the type of network you have

Â you would have to use different types of degree.

Â So, if you have an undirected network you can simply use that degree,

Â but if you have a directed network then you have to choose between using in-degree,

Â or out-degree, or a combination of both.

Â So, let's get into exactly how we would define this.

Â So, we'll say that the degree centrality of a node V is going to be the ratio

Â between its degree and the number of nodes in the graph minus one.

Â So, in this case a node would have a centrality of one if it's connected to

Â every single node in the network and

Â a centrality of zero if it's connected to no node in the network.

Â And so, this measure goes between zero and one,

Â with one being the case where you're most connected.

Â Let's see how we can use network X to

Â find the degree centrality of nodes in this network.

Â So, first let me load up the karate club graph and then let me

Â convert the labels of the nodes so that they match what we see here in this figure.

Â And now, we can use the function degree centrality

Â to measure the centrality of all the nodes in the network.

Â And so here, these returns a dictionary

Â of centralities of every node and we can for example,

Â look at the degree centrality of node number 34 which is 0.515 and that's

Â because node 34 has 17 connections and there are 34 nodes in the network,

Â so that is 17 or 33.

Â The Degree Centrality of note 33 is 0.364 and that is 12 over 33.

Â Now, like I said in directed networks,

Â we have the choice of using the in-degree centrality or the out-degree centrality

Â of a node and everything else is defined in the same way.

Â So, the in-degree centrality of a node V is going to

Â be its in-degree divided by the number of nodes in the graph minus one.

Â And we can use the function in-degree centrality network X to find

Â the in-degree centrality of all the nodes in a directed network.

Â And so in this case, A will have in-degree of 0.143 which is two or 14.

Â There are 15 nodes in this network and L will have a in-degree centrality

Â of 0.214 which is three over 14 and that's because node L has an in-degree of three.

Â And the very same way we can define not using the out-degree instead of

Â the in-degree centrality using the function out-degree centrality

Â and the out-degree centrality of A is 0.214,

Â because it has an out-degree of three and the out-degree centrality of node L is 0.071,

Â because L has an out-degree of one.

Â Another way of measuring centrality is what we call closeness centrality.

Â And the assumption here is that nodes that are important are going to

Â be a short distance away from all other nodes in the network.

Â Recall that we measure distance between two nodes by looking at the shortest path,

Â the length of the shortest path between them.

Â And so, the way we're going to define the closeness centrality of

Â node V is going to be by taking the ratio of the number of

Â nodes in the network minus one divided

Â by the sum over all the other nodes in the network,

Â and the distance between node V and those nodes.

Â So, that's the sum and the denominator in the definition of centrality.

Â Now, so let's say that we want to use network X to find

Â the closeness centrality of this node 32.

Â We can use the function closeness centrality which returns the dictionary

Â of the centrality of the closeness centrality of all the nodes.

Â And here, we find node 32 has a closeness centrality of 0.541.

Â So, using the definition of closeness centrality let's see how this 0.541 comes about.

Â So, first of all let me look at the sum of the length of

Â the shortest path from node number 32 to all the other nodes in the network.

Â I'm using here the shortest path length function,

Â which you've seen before,

Â which gives you the length of all the shortest path from

Â the node number 32 to all the other nodes and that sum here is 61.

Â And so, then if we take the number of nodes in the graph minus

Â one divided by 61 that's how we get this 0.541.

Â Of course, we're making

Â the implicit assumption that all the nodes can actually reach all the other nodes,

Â but of course, this is not always the case.

Â In particular, when we have directed graphs,

Â this is often not the case and even for undirected graphs you can have

Â multiple connected components and you

Â can have it so that some nodes cannot reach other nodes.

Â And so, what happens, how do we measure closeness centrality

Â when a node cannot actually reach all the other nodes?

Â Let's take this example, right.

Â So, node L here can only reach node M. So,

Â how do we measure its closeness centrality?

Â We have a couple of options here.

Â And the first option we can simply only consider the nodes

Â that L can actually reach in order to measure its closeness centrality.

Â So, the way this would work is that we define this set RL to

Â be the set of nodes that L can actually reach and we define

Â the closeness centrality of L to be the ratio of the number of nodes that L can

Â reach divided by the sum of

Â the distances from L to all the nodes that L can actually reach.

Â And so, for node L here,

Â this would be simply one,

Â because L can only reach node M so RL here is just

Â the set M is just the node M and L can reach M in just one step.

Â So, the closeness centrality of L here,

Â would be one over one,

Â which is one and this is the highest possible centrality a node can have in a network,

Â which seems a little bit problematic because as node L can only reach

Â one node and we're saying that it has

Â the highest possible centrality than any node can have,

Â this seems an intuitive, right?

Â So, here's where option 2 comes in.

Â In option 2, we again only consider the nodes that L can reach,

Â but then, we normalize by the fraction of nodes that L can reach.

Â So, the way this looks here is that when we compute the closeness centrality of L,

Â we have the same ratio of RL over the sum.

Â But now, we're going to multiply that ratio,

Â the fraction of nodes that L can reach,

Â RL, divided by the number of nodes in the graph minus one.

Â So basically, we're normalizing by the fraction of nodes that L can reach.

Â And so, if L cannot reach many nodes we're going to be

Â multiplying these other fraction by a very small number.

Â And so, in this case if we do that we find that L has a closeness centrality of 0.

Â 071 which is more reasonable than defined to be one.

Â One thing to note here is that in this new definition when we're normalizing,

Â we're not changing the definition of closeness centrality when the graph is connected,

Â where in every node can reach every other node.

Â That's because when that's the case RL for node L would equal M minus

Â one and this formula that you see

Â here would be the exact same formula that we had before.

Â So, we're not changing the definition.

Â This definition can still apply even if the graph is completely connected.

Â Okay. So, how can we use network X to find the closeness centrality?

Â Well, we can use the function closeness centrality.

Â And here, you get the option of normalizing or not normalizing.

Â And so for example, if we choose not to

Â normalize then the closeness centrality of node L would be one,

Â as we saw before,

Â and if we choose to normalize then it's closeness centrality would be 0.071.

Â So, in summary, today we talked about

Â centrality measures and how they aim to find the most important nodes in the network.

Â We said that there are many many different ways of defining centrality

Â and today we talked about two specific ones.

Â A very basic definition of degree centrality which makes

Â the assumption that important nodes are those who have many many connections,

Â and we use this formula to measure the centrality of a node and it's simply

Â the ratio between the degree of the node and the number of nodes in the graph minus one.

Â And depending on whether we have a direct or undirected graph,

Â we can use the degree of the node or the in-degree or the out-degree,

Â and these are the different functions that you can use in network X to apply them.

Â The second centrality measure that we looked at was closeness centrality

Â and the assumption that it makes is that important nodes are close to other nodes.

Â And here is the general formula that we use to compute it.

Â And here, we can choose to normalize or not normalize as we

Â discussed and the function that we use in

Â network X to compute it is the function closeness centrality.

Â This is it for this video and I hope to see you next time. Thanks.

Â