0:08

Hi. Today we're going to be talking about Triadic Closure.

Â Triadic Closure is the tendency for people who share lots of connections,

Â to form a connection themselves to become connected.

Â So, people who share lots of

Â friends have an increased likelihood of becoming connected themselves.

Â There are many mechanisms that give rise to this process.

Â But, today we're going to be focusing on figuring out how to measure in a network.

Â So, let's say you have a network like this, and you're asked,

Â what edges are likely to come to the network next?

Â What edges are likely to arrive?

Â Well, Triadic Closure would say that those edges that closed

Â triangles are good candidates for edges that may show up next.

Â So, here, all the red edges form closed triangles,

Â and so, these are good candidates for edges that may show up.

Â However, we don't always have time stamps,

Â or we don't always know the ordering in which the edges come into the network.

Â Sometimes, we're just given a static network with no time stamps, and sometimes,

Â we want to know whether Triadic Closure is present in this network,

Â whether it has lots of triangles or not.

Â And so, what we're going to be talking about in this video is,

Â how can we measure the prevalence of Triadic Closure in a network?

Â Another way of referring to Triadic Closure is Clustering.

Â So, lots of the definitions that we're going to be

Â covering today are referred to as Clustering.

Â So, we'll start with a local version of measuring Clustering.

Â We're going to be measuring Clustering from the point of view of a single node.

Â And, this is called a Local Clustering Coefficient.

Â And, the way it's defined is the fraction

Â of pairs of the nodes friends that are friends with each other.

Â The best way to show you how Local Clustering

Â Coefficient works is by showing you an example.

Â So, let's say, you wanted to compute the Clustering Coefficient of node C. What you

Â would need to do is to take the ratio of the number of

Â pairs of C's friends who are friends with each other,

Â and the total number of pairs of C's friends.

Â Okay. So, C has four friends in this network.

Â That means that C has a degree of four.

Â That's what we refer to as degree.

Â It's the number of connections that a node has.

Â And, we refer to it as DC as well.

Â So, DC here, which the degree of C is four.

Â Now, how many pairs of C's friends are there?

Â Well, there are four friends of C,

Â and you can easily see that,

Â if you have four pairs of four people,

Â then there are six total possible pairs of people.

Â And so, the total number of pairs of C's friends is six.

Â Now, this is easy to see because there is only four friends of C,

Â but sometimes, there are many more and

Â it might be harder to see how many possible pairs of friends you have.

Â So, what you can do is you can just use this formula here which tells you how many.

Â It's dc times dc-1 over two.

Â In this case, that number is six,

Â which is 12 or 2.

Â Okay. So, that will be our denominator. What about the numerator?

Â The number of pairs of friends of C who are friends with each other.

Â Well, there are only two pairs of friends of C that are friends with each other.

Â AB and EF. So, that number is two.

Â So then, the Local Clustering Coefficient of node C is two over six or one-third.

Â That means that one-third of all the possible pairs of friends of C

Â who could be friends, are actually friends.

Â Okay. Let's do another example.

Â Compute the Local Clustering Coefficient of node F. Again,

Â we need to compute the ratio of

Â the number of pairs of F's friends who are friends with each other,

Â and the total number of pairs of F's friends.

Â So, we'll do the same thing here.

Â F has a degree of three.

Â So, the number of pairs of F's friends is three times two over two which is three.

Â And then, there's only one pair of friends of F who are actually friends with each other.

Â That's C and E. And so,

Â the Local Clustering Coefficient of F is also one-third.

Â All right.

Â And one last example.

Â Compute the Local Clustering Coefficient of node J. All right.

Â So, node J has only one friend which is node I,

Â which means that J actually has zero pairs of friends.

Â And, because that's what we're supposed to put in the denominator,

Â we're in trouble because we cannot divide by zero.

Â And so, what we're going to do for cases like this,

Â where the definition doesn't work for nodes that have less than two friends,

Â is we're going to assume that nodes that have

Â less than two friends have a Local Clustering Coefficient of zero.

Â And this is consistent with what network X does. All right.

Â So, toggling network X,

Â how do you compute the Local Clustering Coefficient using network X?

Â Well, let's say we load up this graph in the way that we already know how to do it,

Â and we compute the Clustering.

Â We use the function Clustering to compute

Â the Local Clustering Coefficient of node F. This case, it's 0.33.

Â So, we have same.

Â For node A, it is 0.66,

Â and for node J,

Â as we had seen, it is zero.

Â Okay. So, this allows you to compute

Â the Local Clustering Coefficient of each node in the graph.

Â But, what we were interested in is trying to figure out

Â whether Triadic Closure is prevalent in the whole network.

Â And so, how do we go from having a local measure of

Â Local Clustering Coefficient for each node to

Â a global measure of Clustering Coefficient for the whole network?

Â We're going to talk about two different approaches.

Â The first one, which is pretty simple, straightforward,

Â is to simply take the average Local Clustering Coefficient or all the nodes in the graph.

Â And, you can do this in network X by using

Â the function average Clustering of the graph G. And,

Â in this case, that is 0.29.

Â That's approach number one.

Â A second approach, it's the following.

Â We're going to try to measure the percentage of

Â open triads in the network that are triangles.

Â Okay. So, what our open triads and what are triangles?

Â Triangles are simply three nodes that are connected by three edges.

Â And this is called a triangle because it looks like a triangle.

Â Now, open triads are three nodes that are connected by only two edges.

Â The thing to notice here is that

Â a triangle actually contains three different open triads, right?

Â So, if we consider this triangle here,

Â you will notice that it contains three different open triads.

Â The first open triad considers the three nodes and all the edges,

Â these two edges but not this one.

Â That is the first open triad.

Â But, you can also consider the three nodes and these two edges in this one.

Â Or, we could consider the three nodes and these two edges but not this one.

Â Right. So, inside each triangle,

Â there are three different open triads.

Â So, if you go out in the network and count how many triangles it has,

Â and then it counts how many possible open triads it has,

Â for each time that you see a triangle,

Â you're going to count three different open triads.

Â And so, what we're going to do for

Â the second approach for measuring Clustering Coefficient,

Â which is actually called the Transitivity,

Â is simply that it's going to take the number of closed triads,

Â which are the triangles,

Â multiplied times three divided by the number of open triads.

Â And that is the percentage of open triads that are actually triangles or close triads.

Â You can use network X to get

Â the Transitivity of the network by using the function Transitivity.

Â And, in this case, this network has a Transitivity of 0.41.

Â Okay. So, we have two different ways of measuring the global Clustering Coefficient.

Â Are these two ways the same?

Â Which one is better? Are there are differences between the two?

Â Well, it turns out that there are differences between the two.

Â They both try to measure the tendency for the edges to form triangles,

Â but it turns out the Transitivity weights

Â the nodes with a larger number of connections higher.

Â It weights the nodes with a larger degree higher.

Â The best way to see that is by looking at examples.

Â So, here is this graph that kind of looks like a wheel.

Â If you look at this graph closely,

Â you'll find that most nodes actually have a pretty high Local Clustering Coefficient.

Â So, all the nodes that are on the outside of

Â the wheel have a Local Clustering Coefficient of one because,

Â each one of these nodes,

Â you see that it has two connections.

Â So, he has one pair friends,

Â and that pair friend is connected.

Â So, this node here has a Local Clustering Coefficient of

Â one and the same is true for all the nodes on the outside of the wheel.

Â So, most nodes have a high Local Clustering Coefficient.

Â However, if you consider the node inside the wheel,

Â the central node there,

Â that one has a pretty high degree but it has a very low Clustering Coefficient.

Â That is because it has many,

Â many connections in many pairs of connections and only a

Â few of those are actually connected to each other.

Â But, most of them are not connected.

Â For example, these two nodes are not connected.

Â These two nodes are not connected.

Â These two nodes are not connected and so on.

Â Even though all of them are friends with that central node.

Â So, in this graph,

Â the average Clustering Coefficient is pretty high,

Â it's 0.93 because most nodes have

Â a very high Local Clustering Coefficient, except for one.

Â However, the Transitivity of this network is 0.23.

Â And that's because Transitivity weights the nodes with high degree higher.

Â And so, in this network,

Â there's one node with a very high degree compared to the others that has

Â a very small Local Clustering Coefficient compared to the others,

Â and Transitivity penalizes that.

Â So, you get a much lower Transitivity.

Â Okay. Can you see an example that goes the other way around?

Â Well, here is this network. In this network,

Â most nodes have a very low Local Clustering Coefficient.

Â So, each one of these outer nodes here has a Local Clustering Coefficient

Â of zero because they either have

Â only one friend or they have two friends but those two are not connected.

Â And, there are 15 nodes like that.

Â The nodes inside here,

Â there are only five nodes like that and they have high degree,

Â and then they have high Local Clustering Coefficient.

Â So, when you look at the average Clustering Coefficient and Transitivity,

Â you find that the average Local Clustering Coefficient

Â of this network is pretty low because most nodes,

Â the 15 outer nodes,

Â have very low Local Clustering Coefficient.

Â However, the Transitivity is high because the nodes with

Â high degree happen to have high Local Clustering Coefficient.

Â So, these two graphs showed you the differences between the Local Clustering Coefficient,

Â the average Local Clustering Coefficient, and Transitivity.

Â One weights the nodes with a large degree higher.

Â In summary, we've learned that Clustering Coefficient measures

Â the degree to which nodes in a network tend to cluster or form triangles.

Â And there are several ways in which you can measure this.

Â So, the first way,

Â we look at the Local Clustering Coefficient and this is measured on a node by node basis.

Â And in two other ways, we found the global Clustering Coefficient which measures

Â Clustering Coefficient on a global scale for the whole network.

Â For the Local Clustering Coefficient,

Â this one is defined as simply the fraction of

Â pairs of nodes friends who are friends with each other.

Â And just to remind you, in this case,

Â the Local Clustering Coefficient of node C was one-third because one-third

Â of the pairs of friends of C are actually friends with each other.

Â For global Clustering Coefficient,

Â the first way in which we could measure these was by simply

Â taking the average of the Local Clustering Coefficient over all the nodes.

Â And you could use the function average Clustering in network X to do it.

Â And then, the second way was this thing called Transitivity,

Â which was the ratio of the number of

Â triangles and the number of open triads in a network.

Â And this one puts a larger weight on high degree nodes compared to

Â the average Local Clustering Coefficient and you can

Â measure it in network X using the function Transitivity.

Â This is all for today. Thank you for watching.

Â And I hope to see you next time.

Â