0:02

Today it's my distinct pleasure to introduce George Karypis.

Â George is an expert in data mining algorithms and the tools for

Â data mining and recommendation including a large number of libraries that he

Â has developed and released in areas from clustering and

Â analysis, but specific in recommendation.

Â The SLIM top-N Recommender Library, and the SUGGEST top-N Recommender Library.

Â He is a distinguished McKnight professor of computer science and

Â engineering at the university of Minnesota.

Â And in 2013, was the program coach here for

Â the ACM Recommender Systems conference.

Â But most important he is an expert in understanding

Â the subtleties of how recommendation algorithm is may or

Â may not match what they are supposed to do and so first and

Â foremost, welcome to the class George.

Â >> Thank you Gerald.

Â You were telling me about some of the challenges of using

Â matrix factorisation techniques for the specific challenge of generating

Â top end recommendations, I think that might be a great place to start here.

Â >> Sounds good.

Â So method have been very popular and

Â I would say last, almost ten years plus.

Â Getting too old.

Â And the reason they work very,

Â very well they are actually very good at predicting a reading.

Â In recent years there has been some work that's shown under

Â what conditions can actually recover the model using matrification projects.

Â 1:48

And the basic principle of matrification is that

Â assuming that the data is coming in from a low ranked model, right?

Â And then if you observe a relatively small number of

Â entries from the matrix that they're randomly distributed then,

Â you can show a type of build you can actually recover the model.

Â Okay.

Â Now the problem with the top N recommendation is though that the data

Â is not actually, the training data is randomly distributed

Â within the use item matrix, but its very skew distributed,

Â so the result when you actually try to estimate the low rank model,

Â you know, that estimation I still fails.

Â >> So, you're talking about this low rank model.

Â The idea that might takes a really based on some small number of dimensions.

Â You're saying that, the fact that the things I've readed are not random,

Â but the things I've chosen to consume means that the model that

Â we're learning is not an accurate model of my tastes overall,

Â and can't apply necessarily equally well across all items as we go forward.

Â >> Exactly. So

Â the model that gets estimated can not generalize very well to the things that

Â they are far away from the vicinity of the things that you have rated.

Â >> Wonderful. Well,

Â I understand that you've done some work that's shown.

Â 3:12

The practical effect of this on datasets.

Â >> Yes so this is a, we did an experiment that is original.

Â A multi step experiment to try to illustrate that.

Â So, let me tell you guys through that process.

Â So what we took is a, we took some of the interesting datasets out there.

Â So we used the Movielands, Netflix, and Yahoo Music and for each one of

Â those data sets, we synthetically generated a known lower rank model.

Â While it's literally generating a lower rank model, now what I mean by that is

Â that we created a 10-factor model for each one of those things in which

Â the entries of the different factor and item factors were synthetically generated.

Â Okay, so as a result, we had a low rank model

Â 4:02

that can construct, reconstruct the entire matrix, okay?

Â Then, what we did is, we reconstructed the entire matrix.

Â So, let's see a movements case,

Â users by movies and then we selected from that matrix,

Â the non zero entries that correspond to the actual non zero entries of

Â the And using then that sample entries,

Â that it would sample according to the non zero structure of the assets.

Â We fed those entries through our magic compilation algorithm to try to

Â recover the low rank model.

Â And then the algorithm, I use the SDD.

Â And it recover a low rank model.

Â And then what we did is then we used

Â the estimated low rank model now to complete the entire matrix.

Â And then we measured how different is a ground tooth low rank The metrics

Â from which the low rank model was used to generate against the metrics

Â that was estimated as a result of estimated.

Â As a result of applying the metrix completion on those observed entries.

Â So that over there between the ground truth and.

Â 5:17

And the estimated model.

Â And then we're trying to understand where, exactly, the error occurs.

Â So when we did the second part of the exercise,

Â you would say well look at the use of item digraph.

Â And then for every user, would be the personalized run-walk with restart.

Â So we start with the user in that item digraph.

Â And then with equal probability we traverse each of its outgoing edges that

Â connects to an item, and then from each of those items with a probability qual

Â to how many users have rated an item, we visit one of the users and

Â then from those users we go back to one of the items that those users have rated.

Â So this is a very similar to the algorithm that Google uses to rank pages,

Â but we'll do that thing when they use the item graph.

Â And this is a personalized random walk with restart and they restart here.

Â With certain probability from every node will go back to the beginning to

Â the same user.

Â And we follow the same traverse.

Â So at the end of the day, what we compute from that exercise is a steady state of

Â probability of visiting every node in that and

Â that those contain both the user nodes as well as item nodes.

Â And those stage say probability, they're a very crude approximation.

Â You can think of it as some sort of a graph distance

Â from the user to everything else in the graph.

Â And the graph distance measures to allows to extend the multiplicity of the path and

Â the length of the path in the graph.

Â >> Before you go on, I want to make sure our learners

Â catch two very important things that you've said already.

Â One was that your method is based on this idea of creating these

Â simulated synthesized preference models.

Â If by going out there and creating the dimensions of preference, you have a set

Â of consistent theoretical users, that we know their taste and

Â therefore we know how they should rate each individual item.

Â But you're overlaying that with the fact that you're actually sampling.

Â From what real people actually looked at which is bias by popularity and

Â other clustering effects that make it non-random over the whole set of movies.

Â >> That is correct.

Â >> And the second thing that I want to make sure everybody understands clearly

Â is that you're creating this distance measure, this how far am I

Â from any given movie using the movie lens example or

Â any given piece of music in the Yahoo music example by using this random lock

Â a way of saying on average how many hops is it if I'm

Â hopping from me to the items that I've rated consumed.

Â To the people who's consumed those items.

Â To the items that they've consumed and so forth.

Â Technically you're including this restart that is a mechanism to make sure that they

Â were covering this whole space and not just traversing in one direction.

Â And that by using that metric.

Â We get the sense that there are some items that are pretty close to me.

Â Because there are a lot of different items that I've consumed that connect to

Â the same users that have also consumed those items.

Â And there's other items that might be quite far from me.

Â Because it's quite a long chain by the time you get from me.

Â Through seven other people, that it takes for me to get to that item.

Â Okay, so we have a measure of distance.

Â We have a set of synthetic user profiles.

Â Take us through the results.

Â >> Okay, so then what we did is given that we have the measure of distance of

Â the items, so we the items into ten different buckets.

Â So the first bucket contains the one-tenth of the items that are closest to

Â that user based on the personalized random based distance.

Â The second bucket has items that are the next 10% closest to the user and so forth.

Â All the way up in the tenth bucket that has the items that are the furthest

Â away, okay.

Â And then for each of those buckets what we're computing,

Â we're computing what is the error between the ground truth data that we know from

Â that lower ranked model for the rating that the user provided on those items.

Â 9:45

To the estimated rating.

Â Why don't I just compute the difference and compute the error, right?

Â So if we going to look at the moving chart over there which is the blue

Â one in the plot.

Â You'll see that the first six buckets, the error is very close to zero.

Â In other words our estimated model from the observed non zero entries.

Â Adhere to the moon lances.

Â Actually it does a very good job

Â in generating ratings that there are very accurate.

Â Right, so they're very close to the ground truth data.

Â But then what happens is when you look at the last four buckets,

Â right so as the items become progressively further and further away from the user.

Â You know, the error keeps on increasing, right?

Â So all the way up the tenth bucket in which the root of means square error,

Â which is, you know, what is applied over there, it's close to 1.2.

Â And, and this is a, those are the i-tenth.

Â And they are the furthest away from the user, right?

Â And, and those are the items that the, the model,

Â the low-rank model that the matrix factorization estimates, right?

Â It gives very high error compared to ground truth data.

Â 10:52

So and a similar behavior happens for the other two data sets.

Â Both the Netflix and Yahoo.

Â And Yahoo is not quite as pronounced, but for Netflix it is.

Â You know actually Netflix doesn't do a very good job in actually even

Â reconstructing the very first full buckets.

Â And then after that you know it gets even worse.

Â And that has to do with the Netflix significantly [INAUDIBLE] that we use.

Â >> So to pull this together, what you're saying here is that we're

Â going to see significantly increasing error

Â as we're looking at the items that are more and more distant.

Â From the persons already consumed items, and that means if we're looking at the top

Â end list, perhaps the thing where most worried about is the items at the top may

Â not be at the top because they're really the things the person would like the most

Â but rather because they're far away and just have a large error and

Â that stood out more than anything that was accurately modeled in this model.

Â Is there something we can do about this?

Â Because we like matrix factorization algorithms, but we obviously

Â don't want to get into a situation where the top end that we're seeing,

Â is a bunch of things that look highly novel because they're not what I,

Â usually consumed but turn out to be not things I would like either.

Â >> Yes, we can, and, again, the solution to that is something which people

Â have came up to it quite a few years ago empirically.

Â And really the simple thing to do is just combine some of those methods, right?

Â And so one of the thing that we did Is once we look at the previous chart in

Â which we saw that the things that are far, far away in the personalized random walk.

Â And those tend to have high error.

Â Then what we really want to do is we want to penalize, right?

Â So the simplest way to penalize something like that is to have some sort of

Â a hybrid client function.

Â In which something would be ranked high only if it has a high rating.

Â Right. And same time it's closed

Â with respect to that personalized.

Â And a very straightforward way to do that is just multiply those cores out.

Â And we did some experiments with that idea with the movie lance data set,

Â and that chart that I'm showing over here, it's a little bit busy but

Â let's focus a little bit on the blue one which are the first bar.

Â And the red one which is the red bar, right.

Â So the blue bar over there shows the recoiled n and for

Â different values of n for 1, 5, 10, so forth.

Â When the movie lance data set, consider to medical physicians by itself has

Â about recoiled it in or at 1 of about 20%.

Â 13:43

The personalized page rank, if I'm going to rank the items based on that and

Â then recommend the top item from the personalized page rank,

Â it's recoil is about 22, 23%.

Â So a little bit better than the Now,

Â if I take those two scores the one is a predictive rating from the model and

Â the other one is a personalized page rank probability.

Â And actually multiply them out and show the results, the items based on there,

Â I get the full bar, the fourth bar in the chart which is the scion with circles.

Â You can see that for the record at one,

Â the performance gets close to about 30, 30 plus.

Â So, it's a significant improvement over both metrification and

Â personalized page rank.

Â And the same thing happens for different values of Now [INAUDIBLE] part

Â over here is we can play our exercise not only with personalized page,

Â we can combine different methods.

Â For instance, we can combine the [INAUDIBLE] method with the result

Â I get from actually doing a [INAUDIBLE] so this is called [INAUDIBLE].

Â And I get a combined method.

Â And I can combine many of those methods.

Â So then the take away method from here is that manufactorization works well

Â as long as we ask it to estimate the rating for a particular item.

Â In the vicinity that is somewhat nearby to the set of

Â items that the user has rated or has liked in the past.

Â And the further away we go from there, the performance degrades.

Â And so we combine its ability to give us accurate ratings for

Â those nearby items with a way of identifying what is nearby.

Â [INAUDIBLE] Right?

Â Then we can get a win win situation.

Â >> So it does seem like there's a recurring message here as well,

Â that there's no free lunch.

Â That in theory, what we'd like to do is recommend items that are far

Â away from what you know anything about.

Â But that are wonderfully delightful and you'll enjoy them greatly.

Â And we've seen this already.

Â That the item, item recommenders don't tend

Â to do very much recommendation of things that are very different.

Â I think what you're telling us here is that,

Â while the S-V-D recommenders sometimes do give us

Â far away items on the top end lists, they're doing that at the cost of giving

Â us items that they don't really have a good basis for telling us we'll like them.

Â And as we're designing our recommenders we have to be very conscious of the tradeoff

Â between recommending something that is different and recommending something

Â we actually have any evidence that the person's likely to enjoy.

Â >> Exactly.

Â I mean, the further we go, you know,

Â you make your recommendations on sparse and sparser data.

Â So it's an issue of you know, what can you actually estimate at that point in

Â time by relying entirely on that you know the rating metrics or

Â what the user has bought in the past, or has clicked in the past.

Â So the only way to reliably go far, far away, no not even term far, far away.

Â Is if you want to somehow doing some additional piece of information.

Â And this is where content based recommended systems or

Â any other non rating data usually help.

Â