0:00

Hi folks. So now we're going to talk about another

Â property which is important in capturing networks, and in particular is one which

Â is looking at a local property of the networks.

Â So, in particular what's going on when we zoom in on, on given nodes and, begin to

Â understand the relationship between different ties in the network this is

Â known as clustering. And in particular, when we begin to think

Â about asking how dense is a network at a local level, we could ask a question of

Â you know, what fraction of the people who I'm friends with, are friends with each

Â other? And so, clustering looks at if we have a

Â given node i, and we look at two of i's friends j and k, what's the chance that

Â those two are related to each other. So what's the frequency of lengths among

Â the friends of i. So if we want to look at a given node i,

Â and ask what the clustering is for that node i, in a given network, then we can

Â say okay, let's look at i's neighborhood and look at all the pairs of friends that

Â i has. Two different k's and j's in that

Â neighborhood. And keep track of, for those possible

Â pairs, how many of them are actually connected to each other, compared to the

Â overall number of them. And so that gives just a, a fraction of

Â how many of, of your friends are friends with each other.

Â and then average clustering, we can just take that number and average it across

Â all the different nodes in the network. Okay?

Â So, that's a particular measure of, clustering.

Â And, it, there are different ways to measure clustering.

Â And so what we did was just do the average.

Â So first calculate it for a given node i, and then average across all different

Â nodes. And what that does, is it weights this

Â clustering node by node. And another way to do this, would be

Â instead to look at overall clustering. So look at all possible nodes and pairs

Â of friends that they have, and ask overall in the whole network every time

Â we've got a, a particular situation which looks like this, what's the chance that

Â it's connected and those, others are connected?

Â And so instead of first doing this node by node and then averaging the, this is

Â done overall and we're comparing out of all the possible triples in the network

Â where we see them connected in a, in a situation like this.

Â What's the frequency with which they're connected over?

Â So this is overall clustering. And, these numbers an be different.

Â So, which way you measure it, whether you're weighting it my node, or doing it

Â as overall possible triangles in, in the network, it's going, can possibly give

Â you different answers. So just as an example, let's suppose we

Â had a situation which looked like this. Where we have in particular a, you know,

Â a given node here at the center. And we keep forming the, this node has

Â groups of friends in three's that are all friends with each other, but aren't

Â friends across these different groups of three.

Â So we keep looking at these different groups of three, and what do we find?

Â In terms of average clustering, this is going to go to 10 to one.

Â So, for instance out of nine, node nine's friends every pair of friends that nine

Â has know each other. And that's true for ten as well, and

Â eight. So as we look at most of these nodes,

Â they're actually clustered at 100%. All of their pairs of friends are friends

Â with each other. but when we look at one, very few of

Â one's friends are going to actually be friends with each other.

Â And interestingly enough, if you began to keep adding more and more groups like

Â this, the number of triangles that you form in a network, a lot of the triangles

Â are actually going to be triangles which go through 1, and so the overall

Â clustering can be much much smaller than the average clustering in a network like

Â this. And so, you know, what you're measuring,

Â whether you are doing it node by node or whether you're doing it overall by

Â looking at possible triangles and then asking whether they are completed you can

Â get different answers. And so they measure different things and,

Â and it's important to sort of keep that keep that straight.

Â Now one thing that's going to be important in this setting is that when we

Â compare this to what happens in a, in a network uniformly at random.

Â If we ask what's the clustering number in a uniformly at random network, well, this

Â is just simply going to be p. So any time we actually look at, at a

Â connection like this and we ask what's the possibility of, of this link being

Â present? The prof, possibility of this link being

Â present, ignores all the rest of the information, it was just formed with sum

Â probability p. So the clustering is going to be p,

Â regardless of whether we look at average or overall we're always going to get an

Â answer of p for what that number is. And so if we're looking at very, very

Â large networks, and people have a relatively small number of friends

Â compared to the overall network, then p is going to be going to 0, and so

Â clustering in a Poisson random network, or an Erdosâ€“Renyi random network, this

Â gnp kind of network, is going to go to 0 as n grows, if p is actually getting

Â small. which will often be the case in a lot of,

Â of settings we're going to be interested in.

Â So what that tells us is that random networks are going to tend to have very

Â low clustering if we're looking at uniform at random.

Â And then we can look at actually what we see in data.

Â And when we look in data across a variety of different kinds of, of data sets we

Â tend to see, numbers which are much higher than would have occurred at

Â random. So a study of prison relationships by

Â MacRae in 1960 clustering is about 0.31, it's about 0.01 if you do the following

Â calculation. Look at the same Expected degree, but

Â instead look at GNP model so then there's basically about 1.3% of the, of the links

Â are present and so your, your clustering should be 1.3 if it was uniformly random

Â and yet, it's 31% in the data. So that tells us that the network looks

Â dramatically different, then what would have happened if you'd point these links

Â down uniformly at random. Co-authorships 15% in math

Â co-authorships. Here you see that the p is extremely

Â tiny. These are large graphs with, with a lot

Â of mathematicians never having collaborated together.

Â .09 in biology again, so, so here you see much higher numbers than you would have

Â seen at random. worldwide web if you look at it without

Â paying attention to direction, your going to get about 11% again a much

Â smaller number if you don't. If you look back to our data from the

Â Florentine marriages, and in this case here I've included the business dealings

Â as well. so this is Padgett and Ansell's data from

Â the 1430's. here you get a clustering of about 0.46,

Â at random it would be at about 0.29. So that's another situation where we've

Â got substantially higher clustering than at random.

Â So this is another property of networks. This has been a more local property of

Â networks looking at, at how the, the links relate to each other, not just how

Â they're distributed over the network, and so forth.

Â so we've, we've, taken a look at, at a variety of, of different measures we're

Â going to now begin to look at putting nodes in context and, and other kinds of

Â things. So additional definitions that will help

Â us go forward in, in managing to keep track of networks, and talk about their

Â properties, and talk about their characteristics in a meaningful way.

Â