0:00

So the way that Google chose to explain the page rank concept was in terms of

what they call The Random Surfer. And The Random Surfer as it says surfs

the web at random and goes to random pages.

But there's a specific way that he does it in order to quantify the importance

score of each of these pages, which we'll look at.

But just conceptually right now, just to see what we mean.

and this graph on the right hand side, we have eight nodes.

And so suppose we're starting off at page A, right, and when we're on page A, we're

facing the decision. We have to click on one of the hyperlinks

on page A, right. So we can either go to B, we can go to E,

or we can go to D. So we choose one randomly.

We have a 1 3rd chance of each, because we're choosing randomly, and there's

three of them. And that brings us right down to page D.

And so, then, now we're on page D and then, again, we have to choose randomly

again. So we choose, well, we say there's a

third of a chance of each of these 1, 2, 3 and so we go over to page eight.

And then, now we're on page E. And then now, once were on e, then we say

maybe the person is bored, right? So maybe the person decided that he was

bored, and he didn't like what he was reading, and so he goes and enters a

random URL into his browser, www.something.com.

And so, now this action is just a random Not surfing, because he's not really

surfing and clicking on links, but is just a random entry of a URL.

So he's going to just end up somewhere else in the connectivity.

So he ends up on F. And so, this one actually that he just

made now did not follow the connectivity of the webgraph, but it's still an action

that we take into account within the random surfer.

And so -- and then, when he's on F again, he chooses one of the two hyperlinks, and

he decides to go to G and so on. So just to summarize, with the Random

Surfer, we define two possible actions. The first is to pick a hyperlink randomly

from the current page, right, so this is how we want to go next.

You can pick a hyperlink randomly from the current page, and that's this whole 1

3rd, 1 3rd, 1 3rd, business we were talking about, were you just choose one

where you go next, and the second is to enter a URL into the browser randomly.

And those are two actions. This one follows the connectivity, and

this one doesn't follow. And so, you can see already how This kind

of takes into account the importance of the lengths here because the lengths

which are coming from nodes, which themselves are more important are going

to get visited more often and going to be used more often.

And so the question that pagerank asks is what is the chance that a page is going

to be selected at any time? So, if we keep surfing over and over and

over and over and over again, how many fractions of the total number of searches

is each page going to get? Which ones are going to be visited more

frequently and we say those are the more important.

So what is the chance a page is selected at any time?

And that's the idea behind PageRank. So again now, the ones that are going to

get visited the more often are the ones that have links that are more likely to

be chosen. And the ones that have links that are

more likely to be chosen are in terms going to be from nodes that are visited

more often. So you can see sort of the logic that

keeps circling here. That important is dependent upon the

importance of the links, which is dependent upon the importance of the

nodes, which is then dependent upon the importantance of the links, and so on and

it gets kind of complicated. So we keep going back.

And, but, we'll see how we can quantify that pretty simply, just to see what the

importance scores mean and how the represented.

So, now let's talk a little more about the importance score.

Let's ignore the idea and the random surfing of getting bored for now, okay.

And we'll return back to our initial example with four nodes which we just ran

an in-degree version of importance on before.

And so Each node, we're going to say, has it's own importance score.

Okay. So, they have their importance score and

we can make, use lower letters for that. So, for big W we'll say the importance

score is w, or little w. For X, we'll say the importance score is

little x. For Y, we'll say it's little y.

And for Z, we'll say it's little z. And so now we have to figure out what are

the chances that each of these links are going to be visited.

So it's very simple, is that if we're on node W, okay, we only have two choices,

okay? If we ignore the idea of getting bored

and surfing randomly, okay for R and W we're going to follow the connectivity of

the web graph. I'm going to make that assumption.

So, we have to choose one of the two links.

We either choose this link or we choose this link.

Each of them is going to clearly have one half of a chance because there's two.

Right? So, it's like flipping a coin.

Just choose one of the two. [NOISE] Similarly for if we're already on

Node y for instance, y only has one outgoing link.

So, it's only one choice that you have to go to z next.

So, this is actually a chance of one. But now as we said we want to figure out

how important the links are and that's going to depend upon how important the

node is itself. Okay.

So, we in a sense need each of these nodes to be spreading their importance

score to this link. So, it should make sense then that we

should multiply this by w and say well this link is one half w.

This other link right here is also one half w.

And in that way then we can look at this much easier and say, okay when, when

we're at W. W.

starts with an importance score of W. Then this link is half of W, one-half W.

This other link is another half of W, because there's two of them.

So W gets divided by 2, and we say it spreads its score to those two links.

And similarly, for Y there's only one link right here, so all of Y's importance

gets put off on this one link. Okay.

And so the reason is first we're splitting half and half on each of the

links, and second is that we have to wait by the importance in the first place,

because that's how important the node is itself.

So, the more important node W is, then you see the higher the importance of each

of these links will be. So if W is an very important node then

that will mean a link from W will be Very important, you see, that's why there's

this w factor in here, which take care of that, of weighing the importance of the

nodes on the links themselves. So as we said, each node has its own

importance score and this spreads an equal amount of importance to each

outgoing link. And so we can also say that that's

related to the out-degree of a node. So we talked before about the in-degree,

which was the number of incoming links, and we used that as a measure of

importance. But now we're actually using the

out-degree as well and we're saying that basically we're dividing in each length

Gets a fraction of the out degree, right, because if the out degree is three then

each length gets a third, if the out degree is four then each length gets a

fourth of the important square of the node and so on, so it's just a way of

looking at it. This out degree of W right here is two,

so each of these gets one half. The out degree of y is one, so each of

these gets one... The out-degree of x is 2.

So each of these would get 1/2 x. 1/2 x.

We can right those equations. Now the out-degree of z is 3.

Cause z gets divided three times. So each of these is 1/3.

We have 1/3 z here, 1/3 z here, and 1/3 z here.

Now notice that if you add What's outgoing, you will get this importance

score in the beginning, and that's, again, the idea of spreading.

This is one-thirdsy, two-thirdsy, onesy, if we add all those up.

One half plus one half is one. One half plus one half is one.

One is just equal to one. 'Kay so that's what that whole

conservation principle, that we're only spreading the amount of importance.

Each node is only spreading the amount of importance that it has to begin with.