0:09

Hello.

Â We've been thinking about how to measure the importance or

Â the centrality of a node in a network.

Â And today we're going to talk about another way of doing this.

Â It's called PageRank and it was developed by the Google founders when they were

Â thinking about how to measure the importance of webpages

Â using the hyperlink network structure of the web.

Â And the basic idea,

Â is that PageRank will assign a score of importance to every single node.

Â And the assumption that it makes, is that important nodes are those that have many

Â in-links from important pages or important other nodes.

Â 0:40

Now, Page Rank can be used on any type of network, for example, the web or social

Â networks, but it really works better for networks that have directed edges.

Â So, for example, a social network where the edges indicate who emailed whom.

Â And so, if we think about the definition, what I said,

Â that important pages are those that have many in-links from more important pages,

Â then there's a little bit of a circular definition, right?

Â Because if you imagine trying to measure the PageRank of a particular node,

Â let's say the first node that you're looking at,

Â then you would want to look at the PageRank of the nodes that point to it.

Â And you don't have those yet.

Â So the ways it's defined sounds a little bit circular.

Â And in this video,

Â we're going to walk through the process of how you can actually compute it.

Â 1:22

So the setup is that we're going to have a network with n nodes, and

Â then we're going to compute PageRank on a step by step fashion.

Â And so, what's going to happen, is that the sum of the PageRank or

Â all the nodes is always going to be constant, it's always going to be 1.

Â And so at first, we're going to start with every node having a PageRank value of 1/n.

Â And then what we're going to do is we're going to have every node give all

Â of its PageRank to all the nodes that it points to, and

Â then were going to do this over and over again.

Â 1:51

So, what we're going to do is we're going to

Â perform these Basic PageRank Update Rule k times.

Â And the Basic PageRank Update Rule does what I just said.

Â So every node will give an equal share of its current PageRank to all the other

Â nodes that it links to.

Â And then, the new value of PageRank for every node is going to be simply the sum

Â of all the PageRank that are received from all the nodes that point to it.

Â 2:13

So let's look at an example.

Â Let's take, for example, this network here that you see.

Â And now take some time to think about which one should be the most

Â important node.

Â And then we're going to compute PageRank, and

Â then you can see if your guess matches with what PageRank would say.

Â 2:40

So first, like I said, we're going to assign every node a PageRank of 1/n.

Â So in this case, every node will have a PageRank value of 1/5.

Â And now we're going to start applying the Update Rule that I described.

Â So we're going to have to keep track of the old values of PageRank, and

Â what the new value of PageRank of each node is.

Â 3:11

So if we look at D, D has three edges,

Â that points to three different nodes, C, A, and E.

Â So A is going to receive 1/3 of the current page rank that D has.

Â D currently has 1/ 5 PageRank, and so

Â A is going to get 1/3 of that 1/5 PageRank that D has.

Â Now A is also going to get PageRank from node E, and because E only points to A,

Â then it's going to give all of its PageRank to node A.

Â And so A is going to get 1/5 PageRank from node E.

Â And so, in total, A is going to get 4/15 PageRank from those two nodes.

Â And so the new value PageRank of node A is 4/15.

Â Now, let's think of node B.

Â 3:56

And A currently has 1/5 PageRank.

Â Now, it's important here to note that we've updated the PageRank of node A to be

Â 4/15, but when we do this computation, we're always using the old values,

Â not the updated values, so B is going to get a PageRank of 1/5 from node A.

Â And then B is also getting PageRank from C, and so because C is only pointing to

Â node B then it's also going to give all of its PageRank to B which is 1/5 and

Â so in total, B is getting 2/5 PageRank from both nodes.

Â 4:28

Now let's look at node C.

Â C is getting PageRank from D which, again, is pointing to three nodes, and

Â is giving 1/3 of its PageRank from node C.

Â So C is getting 1/3 times 1/5 from D.

Â And it's also getting PageRank from node B.

Â And because Bs pointing to two nodes, B is going to give 1/2 of

Â its PageRank to node C for a PageRank of 1/2 times 1/5.

Â And so, in total, C is getting 5/30 or 1/6 PageRank.

Â 4:57

Now, node D is getting PageRank from B, which again is given half of

Â its PageRank to D and half to C so D is getting 1/2 times 1/5.

Â And that's the only node D is getting PageRank from.

Â So it's getting a total of 1/10 PageRank.

Â And then E is getting PageRank from D only.

Â Which is 1/3 times 1/5, and so in total is getting 1/15.

Â So we've now computed the new PageRank values of all the nodes, as so

Â that's our first step, K equals 1.

Â And so now those new values are going to become our old values, and we're going to

Â do this process again, now using the values we got after the first step.

Â We're going to do the exact same thing again to get the second step of PageRank,

Â k equals 2.

Â So let's start with node A.

Â A is getting PageRank from D, 1/3 of its PageRank.

Â And D has 1/10 Page Rank, so it's given a 1/3 times 1/10.

Â And A is also getting PageRank from E,

Â E is giving all its PageRank to A, so it's getting 1/15 PageRank.

Â And so A, in total, is getting 1/10 and that's its new PageRank value.

Â Now if we look at node B, B is getting PageRank from node C.

Â C points to only B, so you want to give all of its PageRank to B.

Â And it has 1/6, so B is getting 1/6.

Â B is also getting PageRank from A, and A is going to give all of its PageRank to B

Â as well, so it's going to give all of its current 4/15 PageRank to B.

Â And, in total, B is going to be getting 13/30.

Â 6:34

Now let's go look at node C.

Â C is getting PageRank from D,

Â which again is giving 1/3 of its PageRank to each of the three nodes it points to.

Â So C is getting 1/3 times times 1/10.

Â And C is also getting PageRank from node B.

Â Which is given half of its PageRank to C, so

Â 1/2 times its current PageRank which is 2/5 for a total of 7/30.

Â Node D is getting PageRank from B, 1/2 times 2/5.

Â Finally E is going to be getting PageRank from D only,

Â which is 1/3 times 1/10 for a total of 1/30.

Â And so that's our new values of PageRank for all those nodes after two steps.

Â Now if we look at those values, we find that node B has the highest PageRank so

Â far after two steps of 0.43, followed by node C, then node D, node A, and E.

Â And so, up to this point,

Â this is suggesting that B is the most important node in this network.

Â 7:41

But what happens if we continue for another step?

Â Well here's the values that you would get if you go to k equals 3.

Â So you notice that the values change a little bit, but so

Â far you still have the same order.

Â And B is still the most important node, followed by C, followed by D,

Â followed by A, and followed by E.

Â As you may wonder what happens if I just sort of continue doing this for

Â k equals 4, 5, 6 and so on?

Â How do I know when to stop, and

Â how do I know when the actual values that I should consider are, right?

Â So what if we continue this process?

Â Well, for this particular network, if you continue doing this over and over and

Â do it for many, many, many steps, it turns out that eventually these values will

Â start to change very little, so they're converging to a unique value.

Â And that unique value, in this case, is the values that I'm showing.

Â So B has a PageRank of 0.38, C has a page rank of 0.25,

Â and D 0.19, and A .12 and E .06.

Â And actually, it turns out that for

Â most networks, these PageRank values will actually converge and

Â that's the value that we think of as the PageRank of the nodes.

Â So, the PageRank of the node is the value that you get after you do this process

Â many, many, many times.

Â 8:53

So in summary, the steps of Basic PageRank are the following.

Â First, you start with all the nodes with a PageRank value of 1/n.

Â And then you perform these Basic PageRank Update Rule k times.

Â And this Update Rule said says that every node is going to give an equal share of

Â its current PageRank to all the other nodes that it links to.

Â And so the new PageRank value of every node is going to be simply the sum of

Â all the PageRank that it gets from all the nodes that point to it.

Â 9:20

And for most networks, these PageRank values will actually converge as

Â k gets larger, as k goes to infinity.

Â We're going to find that these values converge to a unique value.

Â And so that's how you compute a basic PageRank.

Â And that's all for this video, and I will see you next time.

Â