0:00

But we're not done yet. There's still another problem.

For example, in this simple graph

with four web pages represented on four nodes and some links,

we can write down the age matrix.

It is, again, four by four,

and you should be getting better and better at this already.

Age matrix simply looks like this.

And you can easily see that depending on what is

my initial Pi vector I will end up with different important score vectors.

In other words, there are too many more than one consistent important score vectors.

Indeed, if I start with say initialization of 10000,

then I end up with the important score being half half 00.

However, if I start with initialization of zero, say 0.3, 0.7,

0, then I end up looking at important score vector in the end as 0.15,

0.15, 0.35, 0.35 for these four web pages.

So what we really want is to say that if you give me a matrix and I keep doing

the iteration π_T=π_T that whatever matrix might be on each hat,

starting from any initialization vector.

Pi_times_0 should be able to converge meaning

that after sufficiently number rounds of this then I

can be arbitrarily close to a particular fix the vector π_star.

But this example shows that that may not be always the case.

So here is a possible solution and

this solution illustrates actually a very important theme,

randomization, that we'll be encountering quite a few times in

very different contexts in different networks of this course.

This is very first encountering of that randomization idea.

It says that if you're on a particular web page,

a node, in the graph here,

you may be hopping to

the other nodes that's hyperlinked from this node with evenly likely chance.

For example, if there two of them one half and one half.

But this whole behavior may be only Θ% of the time you follow.

And 1-Θ, where Θ is just some number between 0 and 100 in percentage points,

the other (1-Θ)% of the time,

you actually will do a random flipping of the web pages.

So following the embedded outgoing links

is only part of the model of the navigation behavior by users on the web.

You represent, say, Θ%.

And one minus Θ%,

you will be doing a random picking.

It turns out that what theta you

pick determines quite a few things including the rate of convergence.

Smaller theta gives you faster convergence, in general,

but smaller Θ also means that you are pretty much

ignoring the connectivity pattern of the hyperlinked graph.

So that's not too surprising.

It turns out that we tend to pick 85% or 0.85 as

the right trade-off between relevance of

the connectivity pattern and the speed of convergence.

So how do we model back to this random flipping of web pages?

What we're saying is that here is a big matrix which is, really,

(1/N 1/N 1/N 1/N)... and (1/N 1/N)... and every single row is this 1/N.

Which we can write as basically a matrix of 11111111, everywhere is 1/N.

Which we can write as simply

1/N_times_vector of one_out of product with a vector of ones.

Again, out of product may sounds a little weird to you,

but you can write down a simple example,

say, a two dimensional example.

Okay? One one is a column vector out to product with

one one so one by two times two by one times one by two,

you get a two by two matrix which is 1111.

So one out of product with one column then row

vector normalized by one over N is exactly what we want.

So the third matrix is simply one minus

Θ times normalization one,

and this matrix of all ones.

Vector 1 out of product with vector one.

And then don't forget that the original Θ percent of the time you will follow H

or better still the modified HH hat.

And this plus this is what we call the Google matrix represented as G.

Now, don't get confused with the GIJ back in lecture one in cellular network.

That can also give you a matrix G. There are interesting striking parallels

between the two but not stemming from the confusion of the symbols they use.

So this matrix G,

the Google Matrix, is the sum of two matrices.

We know this matrix is nice because is the sum in terms of H which is

sparse and a rank one matrix for the dangling nose.

And this obviously is a rank one matrix as well.

It's a very simple matrix.

So G is the sum of

a large and sparse matrix plus two large and rank one simple structured

matrix. So now we're ready to describe

the very famous pagerank algorithm behind

Google and one of the reasons for Google's success on the technical side.

It says that given this matrix G which is H

plus one over N times w vector times out to product with one vector.

The whole thing Θ percent of the time plus one minus Θ,

the other percentage of the time is multiplying

by this randomization vector giving you an outer

product of matrix or once.

Okay? You define G as such and then you're going to iterate.

Pi transpose at the time K plus one is simply the last iterations PI transpose

of these vectors at iteration K. Multiply on the right not by the H matrix anymore,

we got two modifications,

we need it by Matrix G,

and let K keeps on going.

And you are now actually guaranteed to converge for any initialization Pi vector to

the equilibrium transpose because Pi star transpose times G matrix.

In other words, the important score is actually

the dominant left eigen vector of matrix G not just H.

We're skipping the proofs in linear algebra

but the end result says that with these two modifications we can

guarantee a convergence to a unique defined left eigen vector with eigen value one

relative to this matrix G. And this can be either computed

if G is small enough directly or iteratively through this procedure,

iterating over index, iterations indexed by K.

Now, if you still want to take the two more remaining steps then you'll be done.

One is you want to normalize.

Remember, so far we haven't normalized this Pi vector yet.

Okay? If you really want to know the relative scale you want to normalize them.

So the new Pi i is really the original Pi i divided by the sum of all the Pi's entries.

But the most important thing is actually ranking them.

Remember, we started with ranking not computing scores.

Computing score is a means to the ends of ranking and ranking says

then you can look at either the normalized or the not normalized version of Pi.

And then whichever node has the highest the Pi's ranked the highest,

then the second, then the third and keep on going.

And the top 10 will be going on to the first page of the search results.

And you can see that if rankings is what you care about,

actually you don't exactly see the relative scale.

For example, this might be way more important than 2 which is only

slightly more important than 3 by this computation of importance scores.

Pi. But you wouldn't know, okay,

Google doesn't visualize that scale and perhaps

that's a good idea because that scaling may not be that robust and,

hopefully the ranking itself is more robust.

So in summary, we have constructed three matrices for H to

H_hat to the whole Google matrix G. And

this iteration plus maybe a normalization and finally

our ranking is how roughly speaking Google displace the ranks of the web pages.