[MUSIC] So to this point though, we've assumed that we know all these Lu user topic vectors. So we can stack them up into this L matrix. And we know all the movie topic factors. So we put them altogether in that R matrix. And then, we multiply the two together to get this big prediction of readings that every user would give to every movie. But important thing here is the fact that we don't actually have this information. We don't have these features about the users, or about the movies. So instead, what we're gonna do is we're gonna flip this problem on its head. And we're gonna try and estimate these matrices, this L and R matrix, which is equivalent to estimating these topic vectors, or feature vectors, for every user and every movie based on our observed ratings. So these matrices, or these collections of topic vectors, are the parameters of our model. So going back to the regression module that we had, we talked about models and their associated parameters in thinking about estimating those parameters from data. So this is a very similar notion. So we have data. What are our data? In this case, our data are our observed ratings. So those are the black squares. And our parameters are these user and movie topic factors. Okay, so what we're gonna do is we're gonna estimate each of these from our observe reading. So only using these black cells, we're gonna try and estimate these L and R vectors and resulting matrices. So how are we gonna do this? Well, let's think of a metric for fit just like we talked about in the regression module. If you remember, going back to that module we talked about something called residual sum of squares. There we were talking about a house. It had some set of features. And then, [COUGH] we had weights on those features, those weights were our parameters. And we were predicting somehow sales price. And then, we compared with the actual sales price, and we looked at the square of that difference, and we summed over every house in our data set. Well, in this case, the parameters of our model are L and R. And our prediction, our ratings hat is gonna be <Lu, Rv>, this notation of doing this element wise product in summing. And that's our predicted rating. And then, our observed rating is Rating(u,v), and what we're gonna do is look at that difference, the difference between our observed rating and what we're predicting with our parameters Lu and Rv, and we're gonna square them. And we're gonna say our residual sum of squares of our parameters L and R is equal to this, and then we're gonna sum over all movies that have ratings. So we're gonna include all u, v pairs where [BLANK_ AUDIO] rating, let me use u' to v' to distinguish it from the specific u and v example I gave here. So we're gonna add in all u' v' pairs. Where rating u' and rating v' are available. And where are these available? They are the black squares. Just remember this picture here. Okay, so what I'm doing is I'm taking a given L matrix and R matrix, I'm looking at my predictions. So I'm looking at, and I'm gonna evaluate how well I did on all these black squares. So I'm looking at how well the L and U that I'm using fit my observed ratings. And then, when I want to go to estimate L and R, just like when I wanted to estimate the weights on the regression coefficients in that housing value prediction problem, I search over, in this case, all the user topic vectors and all the movie topic vectors, and find the combination of this huge space of parameters that best fits my observed ratings. And so, the reason this is called a Matrix Factorization model, so that's really key. This is called a matrix factorization model, because I'm taking this matrix, and approximating it with this factorization here. But the key thing is the output of this, is a set of estimated parameters here. And, unfortunately, I've just written over this very, very, very key point. So I apologize for that. Let's pause for a second, so that you get the writing. And now, I'll just say in words this next animation is saying that there are a lot of very efficient algorithms for doing this factorization. And we're gonna talk about them in great extent in the recommender systems, or matrix factorization course later on. But then, okay, so very efficient algorithms for computing these estimates of L and R. How do I form my predictions? How do I fill in all these white squares, which was my goal to start with. Well, I just use my estimated L hat u. And R hat v. And I form my prediction just as we described when we assumed that we actually knew these vectors. Okay, well, matrix factorization is a really, really powerful tool. And it's been proven useful on lots of different applications. But there's one limitation, and that's a problem that we talked about a little bit earlier of the cold-start problem where this model still can't handle this problem of what if we get a new user or a new movie. So that's the case where we have no ratings either for a specific user or a specific movie. So that might be a new movie that arrives or a new user arrives. So this is a really important problem. And one that, for example, Netflix faces all the time. How do we make predictions for these users or movies? [MUSIC]