0:00

So how do we quantify the importance of nodes?

One obvious choice is to use degree for example for this node with two edges

coming in, three edges going out, we say the total degree is five.

In degree is two, out degree three. Dunbar's number is used by sociologists as

an estimate of the largest number of friends in a social network an average

person can keep, Usually quoted at around 150.

But we also know that degree can be misleading number to quantify node

importance. For example, we may want to say that more

important node is pointing to you, make you more important.

And that is exactly what we did in lecture three with Google's ranking of webpages.

Over there we said that construct the matrix G which is a connectivity matrix H

plus rank one matrices and view that matrix as a random walking graph model for

example. And then look at the dominant eigenvector.

The eigenvector associated with the largest eigenvalue of this case, one of

this matrix G. And now we rank web pages based on entries

of this vector. That's what we did with page rank back

then. And then in general, the idea of eigen

vector centrality. Centrality here is just a fancier name to

call importance of nodes. The eigenvector ideas says that it take

the I, dominant eigenvector Okay, the eigenvector says a lot, not just

eigenvalue of some meaningful matrix such as the adjacency matrix A that we just

defined. So, why would you want to do that?

One possible motivation comes from the following linear dynamic system

interpretation. As in the homework problem, you see that

we can often model many useful quantities in the dynamics of, the functionality on a

graph by looking at a vector X. With one entry per node, evolved over this

great time index by T, as multiplying an initialization vector X Time zero does

initialization on the left by this incidence matrix A.

A time T has been applied T Times. Now how do we understand this expression?

We know that any given vector we can decompose that represent that as a way to

sum of the eigenvectors of this matrix A. Call those eigenvectors VI and the

corresponding eigenvalues scalers a lambda. And we can represent any

initialization vector as the linear combination with weight CI of these

eigenvectors. Now substitute this expression back into,

this evolution, we see. The following expression.

A multiply on the left T <i>,,, </i> the summation of CI VI vectors.

Again, CI are the weighted, are the weights to provide the linear combination

among the VIs to give you this vector. Because this is a linear [inaudible], we

can prove the matrix multiplication inside the summation.

. Now if you multiply it, the eigenvector,

an eigenvector of this matrix on the left, and then you just get lunder ivi, and you

have done the key times. Therefor it is just CI times lunder I,

raised to the power of T times VI vector. In other words, we have expressed the time

Ti I vector X as a linear combination of the eigenvector of this matrix a here.

Where the weight's on the CI times lambda [inaudible] lambda I times, to the power

of T. So as time, goes on.

3:56

What we dominate this summation is really the one associated with lamda1.

So this becomes basically like C1 times lambda one the largest eigenvalue times

the power of P V1. In adverse X time T, large T is

approximated by this V1. The dominant eigenvector the one source of

the largest eigenvalue of matrix A. .

So we can take the largest, dominant eigenvector of matrix A as the second

notion of centrality. The third notion of centrality is what's

called closeness. And this takes a distance point of view.

Suppose we've got two nodes, I and J. And there might be many path connecting I

and J. Let's say the distance between node I and

j is the shortest one called as Dij. Now late and lectres nine and thirteen for

socio and techno works respectively will look at how exactly can we compute

hopefully distributedly these Dij's, search for the shortest path, that's not

easy job. But for today let's assume that somebody

had already done that. So we can tabulate these Dij's for all of

the node I, j pairs. Then once we've got this list of all the

dijs, we can do a few things. One is, pick the largest one.

And we call that the diameter of this graph.

It is the maximum dij across all dij pairs.

6:25

That says the following, if I pick a node I here Fix I.

And then look at all the other nodes J, down there.

Okay. And there are N minus one of them.

I look at the distance from me, node I, to all the others.

The average. Then if this number is, small, then I am

more important. Okay?

Cuz I'm, I have a very short distance, to all the other nodes.

If this number is big, I am less important.

So, just to make the direction work out, let's take the reciporical of this.

N -one over summation DIJ over all the IJ [inaudible] for a given fixed node I.

This is what people call the centrality close-, closeness centrality denoted by C

for this given node I. Now if you have a high C sub I, it means

that, you are, you have a very short, shortest the distance to all the other

nodes, on average. Now, we didn't have to use the average.

The simple mean. We can use other kind of means.

For example, harmonic mean, and so on. Now, this is not the end of it.

There is yet another. Centrality measure that's often used.

In fact, there's quite a couple of others as well, including another in the homework

problem. But today we'll stop with this fourth

notion of centrality quantified by the notion of betweenness.

What does that mean? This goes back to Granovetter's 1973

experiment that says, if a node is sort of along the shortest path of many other node

pairs, then it has some strategic importances in the mix of the action.

'Kay. Any other no pairs, if they want to, reach

each other, along a, efficient path, then, they have to pass through you.

8:34

Okay. You are in between many known pairs, your

on many shortest path. And this is the idea of between

[inaudible]. How do we quantify this?

Let's first denote [inaudible] of ST as the [inaudible], total number of shortest

path between two nodes S and T, okay. Then let's denote by NST sub I as the

number of shortest path that node I sits on [inaudible], the pair ST.

So clearly n sup ist is less than or equal to gst, okay.

Now this requires you to count for each pair st the number of shortest path.

Divide the only one and note I might not be on it but you have one for gst and zero

for nist. So now we want to say, well, let me look

at. For each node pair ST, they may have many

shortest path, so you have to normalize by GST.

How many are use [inaudible] sitting on top of?

9:51

Now, this expression would then have to sum up across S and T.

If you sum up across both S and T indices then we're actually double counting all

the node pairs. Which is fine if you want to stick to that

convention. But most books stick to the following

convention. You just order all the nodes.

And therefore you sum across all the S, and then all the, this nacent T, this less

than S in the ordering of the nodes. This will avoid double counting on node

pairs. It's just effect of two difference in the

end. Now we call this summation.

10:34

Okay. The sum of the fraction of the shortest

path that node I sits, across all the other node pairs.

Between this, the node I. Bi for each given node I.

So now we've got the four notions. Degree, which we know isn't hardly a

useful quantity in many instances. We have the eigenvector, the largest.

Dominant eigenvector from incidence matr- adjacency matrix.

We have the closeness centrality, CI, for each node I, and the betweenness

centrality, BI, for each node I. So we got four different matrix.

Which one to use? Again, depends on what you want to do.

But, first, let's illustrate this with a simple example.

This is a network with just eight nodes. Okay?

And we could compute the four different centronic measures for each of the eight

nodes. That would take too long time.

Let's focus just on two nodes, nodes one and two here.

If you just look at a graph, intuitively you say, well Node one got a degree of

three. Node two got a degree of three.

But they shouldn't be equally important. In particular, this is like a [inaudible]

apology. The graph has a left side, a right side.

Visually, I can see that this is an important link between the two.

And node one is part of that link. Whereas, node two is not.

So node one should be more important than node two.

So how do we quantify that notion? Clearly not by degree.

Okay. Well, let's then look at eigenvector

centrality. That means we have to compute.

12:22

Matrix A here. What is matrix A?

And this is a graph that doesn't have directional links, so it would be a

symmetric matrix. I'll just write down the first two rows

here. Clearly the first row says node one

connects to two, three, and four. That's it.

So it's zero, one, one, one, and then zero, zero, zero, zero.

The second row, node two, connects to nodes one, four, five, so one, four, four,

five, and zero, zero everywhere else.'Kay. You can fill out the rest or you can take

a look at the textbook. And now all we need to do is to compute.