And the estimated model.

And then we're trying to understand where, exactly, the error occurs.

So when we did the second part of the exercise,

you would say well look at the use of item digraph.

And then for every user, would be the personalized run-walk with restart.

So we start with the user in that item digraph.

And then with equal probability we traverse each of its outgoing edges that

connects to an item, and then from each of those items with a probability qual

to how many users have rated an item, we visit one of the users and

then from those users we go back to one of the items that those users have rated.

So this is a very similar to the algorithm that Google uses to rank pages,

but we'll do that thing when they use the item graph.

And this is a personalized random walk with restart and they restart here.

With certain probability from every node will go back to the beginning to

the same user.

And we follow the same traverse.

So at the end of the day, what we compute from that exercise is a steady state of

probability of visiting every node in that and

that those contain both the user nodes as well as item nodes.

And those stage say probability, they're a very crude approximation.

You can think of it as some sort of a graph distance

from the user to everything else in the graph.

And the graph distance measures to allows to extend the multiplicity of the path and

the length of the path in the graph.

>> Before you go on, I want to make sure our learners

catch two very important things that you've said already.

One was that your method is based on this idea of creating these

simulated synthesized preference models.

If by going out there and creating the dimensions of preference, you have a set

of consistent theoretical users, that we know their taste and

therefore we know how they should rate each individual item.

But you're overlaying that with the fact that you're actually sampling.

From what real people actually looked at which is bias by popularity and

other clustering effects that make it non-random over the whole set of movies.

>> That is correct.

>> And the second thing that I want to make sure everybody understands clearly

is that you're creating this distance measure, this how far am I

from any given movie using the movie lens example or

any given piece of music in the Yahoo music example by using this random lock

a way of saying on average how many hops is it if I'm

hopping from me to the items that I've rated consumed.

To the people who's consumed those items.

To the items that they've consumed and so forth.

Technically you're including this restart that is a mechanism to make sure that they

were covering this whole space and not just traversing in one direction.

And that by using that metric.

We get the sense that there are some items that are pretty close to me.

Because there are a lot of different items that I've consumed that connect to

the same users that have also consumed those items.

And there's other items that might be quite far from me.

Because it's quite a long chain by the time you get from me.

Through seven other people, that it takes for me to get to that item.

Okay, so we have a measure of distance.

We have a set of synthetic user profiles.

Take us through the results.

>> Okay, so then what we did is given that we have the measure of distance of

the items, so we the items into ten different buckets.

So the first bucket contains the one-tenth of the items that are closest to

that user based on the personalized random based distance.

The second bucket has items that are the next 10% closest to the user and so forth.

All the way up in the tenth bucket that has the items that are the furthest

away, okay.

And then for each of those buckets what we're computing,

we're computing what is the error between the ground truth data that we know from

that lower ranked model for the rating that the user provided on those items.