Hi there, time to discuss more advanced topics in the field of Recommender Systems. In the upcoming videos, we will learn about the neighborhood models, matrix dimensionality reduction. SVD-like algorithms, different types of user's feedback, and iterative procedures to optimize a recommendation model. In short, I'm going to teach how to build collaborative filtering recommender systems. In the neighborhood models, we are looking for similar users or items to make a prediction for an unknown user item preference. Disregarding the type a neighborhood model, user-user or item-item, there are three main components. Normalization, similarity measure, and neighborhood selection. It is easy to explain why we need to normalize users' ratings. Consider two users. One is optimistic by nature as myself and the other is pessimistic. Even if they have the same tastes, they provide feedback on a different scale. An optimistic user provides a rating 2 for the movies which he or she doesn't really like. While a pessimistic user or critic will provide a rating value 1. On the other hand, an optimistic person provides a rating of 5 for all the movies he or she likes. While a critic provides 5 only to outstanding movies, and good movies will get score 4. So in reality, the question of an item likability is more the question of deviation from the mean. That is why it is common to subtract the mean rating value for each user during the pre-processing step. Moreover, it is also common to normalize user scores to have a standard deviation of 1. Therefore, all our users' ratings still be on the same scale. To be mathematically correct, it is called Z-score. Transforming the scale back is pretty straightforward. You multiply the score by the standard deviation and then shift back to the mean user's rating value. The next component is a similarity measure. All the metrics that you saw in the previous lesson when we worked on content-based recommender systems can be applied here as well. Common choice for a similarity measure in neighborhood models is a correlation metric. As usual, life is not as easy as it sounds. There are different ways to calculate correlation or correlation like metric. Which one is most appropriate for your service, you'll be able to find out only during experiments. When we calculate cosine similarity, we use the following formula. If we use mean normalized ratings then the actual formula will look a bit more complicated. In comparison, take a look at the formula for Pearson correlation and try to spot the difference. I think you gotta try it. In Pearson correlation, you only use the items that were rated by both users. You don't try to extrapolate your knowledge to the unknown ratings. Where a cosine similarity expects it to be close to the mean value. Another example is Spearman's rank correlation. It doesn't take into consideration the actual rating value at all. It just sorts all the ratings per user, and assign them the ranks starting from one, and then uses the Pearson correlation formula. Of course, it is more expensive from the computational point of view. As you need to have one extra step to sort values for each user. Is it worth it? Depends on your service and data. There are even more different variations of correlation which take into consideration, support, number of items written by a user or by two of them. Bayesian damps, see damping means in non-personalized recommender systems. And other forms of shrinkage. Intellectually curious, already know where to go. And we are moving on. The last question we need to answer to build neighborhood-based collaborative filtering recommender system is how to choose friends. Neighbors, how to choose neighbors. You can say the more data, the better, but it's not the case. There will be more noise than information. How does help to build a recommendation list for you? On the one hand, you can filter top and most similar neighbors for each user. You can even do it during the cost effective preprocessing step. Therefore, during the prediction or recommendation request, we will be able to find the answer quickly. On the other hand, you can filter neighbors by the similarity threshold. For example, you can take into account only the neighbors with correlation bigger than 0.75. Even better, you can mix these approaches with the previous one. During the step where you select neighbors, you disregard everything below similarity threshold, and then filter the top and most similar neighbors. There is another interesting question. Do we need to take into account the users we have a negative correlation with? For example, you like comedies and your neighbor dislikes that, your neighbor like thrillers but you don't, and so on. In literature, there are a lot of examples where people exclude users with negative correlation from a neighborhood to improve prediction quality. So this technique also can be considered as a part of the strategy. A strategy number four, choosing a random subset of neighbors is not a joke. If you have a dense user item matrix then the choice of a user neighborhood is not so important. A rule to get prediction in user-user collaborative filtering model is the following. It is the weighted average of rating values from the user neighborhood where weights are the similarities between users. It is important to mention that this neighborhood is built of users who rated the item i. Therefore, for each item this neighborhood can be right. And consequently, if you use top and filtering to build a neighborhood during the preprocessing step, then the item neighborhood will be a subset of precalculated neighbors. So we need to make sure to choose big enough N. On the other side, there is an item-item collaborative filtering. It is a common choice for high load services. The reason is that usually item's neighborhood changes much slower compared to user's neighborhood. You calculate item similarity once and use it for ages. Another benefit of item-item collaborative filtering is that it is much easier to explain recommendations to the end user. The benefit of user-user recommendation system is that it provides more serendipitous recommendations in comparison to the item-item algorithm. The cornerstone of recommender system is a cold-start problem. When you have the only one rating for a user, and especially if it is a negative score, then standalone neighborhood collaborative filtering models will not help you to find good recommendations. But a variety of decomposition algorithms which build a latent space are here to help. My colleague, Evgeny Frolov, showed how to benefit even from a single negative user feedback with the help of tensor decomposition during the recent ACM RecSys conference in 2016. I'm glad Evgeny accepted the offer to work with me on these videos and assignments, so you will see him shortly. Of course, we are not going to show you how to work with tensors on Spark. But simple matrix factorization approaches will be covered during the lessons. Nevertheless, all this machine learning is helpful, it might not be of a great interest for you as a big data engineer. Let us take a look at the complexity of the models. So you will be able to help data scientists in your team to make a better choice of a recommender system algorithm from the efficiency point of view. Let me define n as a quantity of users, and m as a quantity of items. When you use user-user collaborative filtering, you can assume n squared space size for all pairwise similarities. Correspondantly, item-item CF will consume m squared space size. During a prediction phase, both of these models consumes the same amount of resources. User-item prediction is a weighted average of a k neighbors, which can be either users or items. Preprocessing step to calculate all pairwise similarities will be n squared multiplied by p in user-user CF. Where p is the maximum number of ratings provided by one user. And m squared multiplied by q in item-item CF. As you can see, during the system exploitation, you will use the same amount of computational resources. But the space consumed and the complexity of the preprocessing step can change dramatically when you switch from one algorithm to another. If you ask how the complexity of these algorithms changes, you can provide the following diagram. In one column, you see the average number of neighbors with at least one rated item or rating user in common. In the other, how many ratings I used on average to calculate pairwise similarity. Based on these numbers, you can make a deliberate choice. Summing up, after watching this video, you now know how to build user-user and item-item recommender systems. And you can list three components to make it working. You also can reason about the choice of a recommender system, taking into consideration accuracy-efficiency trade off.