Hi. Today we're going to be talking about Triadic Closure.

Triadic Closure is the tendency for people who share lots of connections,

to form a connection themselves to become connected.

So, people who share lots of

friends have an increased likelihood of becoming connected themselves.

There are many mechanisms that give rise to this process.

But, today we're going to be focusing on figuring out how to measure in a network.

So, let's say you have a network like this, and you're asked,

what edges are likely to come to the network next?

What edges are likely to arrive?

Well, Triadic Closure would say that those edges that closed

triangles are good candidates for edges that may show up next.

So, here, all the red edges form closed triangles,

and so, these are good candidates for edges that may show up.

However, we don't always have time stamps,

or we don't always know the ordering in which the edges come into the network.

Sometimes, we're just given a static network with no time stamps, and sometimes,

we want to know whether Triadic Closure is present in this network,

whether it has lots of triangles or not.

And so, what we're going to be talking about in this video is,

how can we measure the prevalence of Triadic Closure in a network?

Another way of referring to Triadic Closure is Clustering.

So, lots of the definitions that we're going to be

covering today are referred to as Clustering.

So, we'll start with a local version of measuring Clustering.

We're going to be measuring Clustering from the point of view of a single node.

And, this is called a Local Clustering Coefficient.

And, the way it's defined is the fraction

of pairs of the nodes friends that are friends with each other.

The best way to show you how Local Clustering

Coefficient works is by showing you an example.

So, let's say, you wanted to compute the Clustering Coefficient of node C. What you

would need to do is to take the ratio of the number of

pairs of C's friends who are friends with each other,

and the total number of pairs of C's friends.

Okay. So, C has four friends in this network.

That means that C has a degree of four.

That's what we refer to as degree.

It's the number of connections that a node has.

And, we refer to it as DC as well.

So, DC here, which the degree of C is four.

Now, how many pairs of C's friends are there?

Well, there are four friends of C,

and you can easily see that,

if you have four pairs of four people,

then there are six total possible pairs of people.

And so, the total number of pairs of C's friends is six.

Now, this is easy to see because there is only four friends of C,

but sometimes, there are many more and

it might be harder to see how many possible pairs of friends you have.

So, what you can do is you can just use this formula here which tells you how many.

It's dc times dc-1 over two.

In this case, that number is six,

which is 12 or 2.

Okay. So, that will be our denominator. What about the numerator?

The number of pairs of friends of C who are friends with each other.

Well, there are only two pairs of friends of C that are friends with each other.

AB and EF. So, that number is two.

So then, the Local Clustering Coefficient of node C is two over six or one-third.

That means that one-third of all the possible pairs of friends of C

who could be friends, are actually friends.

Okay. Let's do another example.

Compute the Local Clustering Coefficient of node F. Again,

we need to compute the ratio of

the number of pairs of F's friends who are friends with each other,

and the total number of pairs of F's friends.

So, we'll do the same thing here.

F has a degree of three.

So, the number of pairs of F's friends is three times two over two which is three.

And then, there's only one pair of friends of F who are actually friends with each other.

That's C and E. And so,

the Local Clustering Coefficient of F is also one-third.

All right.

And one last example.

Compute the Local Clustering Coefficient of node J. All right.

So, node J has only one friend which is node I,

which means that J actually has zero pairs of friends.

And, because that's what we're supposed to put in the denominator,

we're in trouble because we cannot divide by zero.

And so, what we're going to do for cases like this,

where the definition doesn't work for nodes that have less than two friends,

is we're going to assume that nodes that have

less than two friends have a Local Clustering Coefficient of zero.

And this is consistent with what network X does. All right.

So, toggling network X,

how do you compute the Local Clustering Coefficient using network X?

Well, let's say we load up this graph in the way that we already know how to do it,

and we compute the Clustering.

We use the function Clustering to compute

the Local Clustering Coefficient of node F. This case, it's 0.33.

So, we have same.

For node A, it is 0.66,

and for node J,

as we had seen, it is zero.

Okay. So, this allows you to compute

the Local Clustering Coefficient of each node in the graph.

But, what we were interested in is trying to figure out

whether Triadic Closure is prevalent in the whole network.

And so, how do we go from having a local measure of

Local Clustering Coefficient for each node to

a global measure of Clustering Coefficient for the whole network?

We're going to talk about two different approaches.

The first one, which is pretty simple, straightforward,

is to simply take the average Local Clustering Coefficient or all the nodes in the graph.

And, you can do this in network X by using

the function average Clustering of the graph G. And,

in this case, that is 0.29.

That's approach number one.

A second approach, it's the following.

We're going to try to measure the percentage of

open triads in the network that are triangles.

Okay. So, what our open triads and what are triangles?

Triangles are simply three nodes that are connected by three edges.

And this is called a triangle because it looks like a triangle.

Now, open triads are three nodes that are connected by only two edges.

The thing to notice here is that

a triangle actually contains three different open triads, right?

So, if we consider this triangle here,

you will notice that it contains three different open triads.

The first open triad considers the three nodes and all the edges,

these two edges but not this one.

That is the first open triad.

But, you can also consider the three nodes and these two edges in this one.

Or, we could consider the three nodes and these two edges but not this one.

Right. So, inside each triangle,

there are three different open triads.

So, if you go out in the network and count how many triangles it has,

and then it counts how many possible open triads it has,

for each time that you see a triangle,

you're going to count three different open triads.

And so, what we're going to do for

the second approach for measuring Clustering Coefficient,

which is actually called the Transitivity,

is simply that it's going to take the number of closed triads,

which are the triangles,

multiplied times three divided by the number of open triads.

And that is the percentage of open triads that are actually triangles or close triads.

You can use network X to get

the Transitivity of the network by using the function Transitivity.

And, in this case, this network has a Transitivity of 0.41.

Okay. So, we have two different ways of measuring the global Clustering Coefficient.

Are these two ways the same?

Which one is better? Are there are differences between the two?

Well, it turns out that there are differences between the two.

They both try to measure the tendency for the edges to form triangles,

but it turns out the Transitivity weights

the nodes with a larger number of connections higher.

It weights the nodes with a larger degree higher.

The best way to see that is by looking at examples.

So, here is this graph that kind of looks like a wheel.

If you look at this graph closely,

you'll find that most nodes actually have a pretty high Local Clustering Coefficient.

So, all the nodes that are on the outside of

the wheel have a Local Clustering Coefficient of one because,

each one of these nodes,

you see that it has two connections.

So, he has one pair friends,

and that pair friend is connected.

So, this node here has a Local Clustering Coefficient of

one and the same is true for all the nodes on the outside of the wheel.

So, most nodes have a high Local Clustering Coefficient.

However, if you consider the node inside the wheel,

the central node there,

that one has a pretty high degree but it has a very low Clustering Coefficient.

That is because it has many,

many connections in many pairs of connections and only a

few of those are actually connected to each other.

But, most of them are not connected.

For example, these two nodes are not connected.

These two nodes are not connected.

These two nodes are not connected and so on.

Even though all of them are friends with that central node.

So, in this graph,

the average Clustering Coefficient is pretty high,

it's 0.93 because most nodes have

a very high Local Clustering Coefficient, except for one.

However, the Transitivity of this network is 0.23.

And that's because Transitivity weights the nodes with high degree higher.

And so, in this network,

there's one node with a very high degree compared to the others that has

a very small Local Clustering Coefficient compared to the others,

and Transitivity penalizes that.

So, you get a much lower Transitivity.

Okay. Can you see an example that goes the other way around?

Well, here is this network. In this network,

most nodes have a very low Local Clustering Coefficient.

So, each one of these outer nodes here has a Local Clustering Coefficient

of zero because they either have

only one friend or they have two friends but those two are not connected.

And, there are 15 nodes like that.

The nodes inside here,

there are only five nodes like that and they have high degree,

and then they have high Local Clustering Coefficient.

So, when you look at the average Clustering Coefficient and Transitivity,

you find that the average Local Clustering Coefficient

of this network is pretty low because most nodes,

the 15 outer nodes,

have very low Local Clustering Coefficient.

However, the Transitivity is high because the nodes with

high degree happen to have high Local Clustering Coefficient.

So, these two graphs showed you the differences between the Local Clustering Coefficient,

the average Local Clustering Coefficient, and Transitivity.

One weights the nodes with a large degree higher.

In summary, we've learned that Clustering Coefficient measures

the degree to which nodes in a network tend to cluster or form triangles.

And there are several ways in which you can measure this.

So, the first way,

we look at the Local Clustering Coefficient and this is measured on a node by node basis.

And in two other ways, we found the global Clustering Coefficient which measures

Clustering Coefficient on a global scale for the whole network.

For the Local Clustering Coefficient,

this one is defined as simply the fraction of

pairs of nodes friends who are friends with each other.

And just to remind you, in this case,

the Local Clustering Coefficient of node C was one-third because one-third

of the pairs of friends of C are actually friends with each other.

For global Clustering Coefficient,

the first way in which we could measure these was by simply

taking the average of the Local Clustering Coefficient over all the nodes.

And you could use the function average Clustering in network X to do it.

And then, the second way was this thing called Transitivity,

which was the ratio of the number of

triangles and the number of open triads in a network.

And this one puts a larger weight on high degree nodes compared to

the average Local Clustering Coefficient and you can

measure it in network X using the function Transitivity.

This is all for today. Thank you for watching.

And I hope to see you next time.