[MUSIC] So now, let's talk about how we can use these metrics of precision and recall to compare our different algorithms that we might think about using. And to do this we can draw something that's called the precision recall curve. And let's start by talking about what these curves are gonna represent. And what the curves are gonna represent is for a given recommender system, what I'm gonna ask you to do is I'm gonna ask you to vary the threshold on how many items that recommender system is allowed to recommend to me. So I'm gonna rank, for example all the baby products on Amazon and I'm gonna allow you to recommend just one or two or three and so on. And so that's a threshold that you're varying. And that's going to trace out this curve. And what would this curve look like for optimal recommender where we only recommend products that I like? Well, what's the precision when I recommend just one product? Well we know that's a product I like and so my precision, my world is just that one product, and I liked it, so my precision is one. And what's my recall though? If I have let's say, ten items that I liked and I've only uncovered one, it's one-tenth. And likewise, as I'm increasing the number of items I show, my precision always stays at one. I'm only recommending products I like, but my recall is increasing because I'm covering more and more of the items that I liked. So eventually, we will hit this 1, 1 spot. So the optimal precision recall curve is this line here. Okay, but let's talk about what the curve might look like for another more realistic recommender. I guess before that, let me just annotate that this is our optimal recommender. Now, let's go on to look at perhaps a more realistic recommender and what it would look like. Okay, well, the first product we recommend might not be a product I like, or it might be, so it's gonna start somewhere on a precision axis. And then eventually at some point when I vary the threshold enough, at some point hopefully, I will recommend some product I like. So both precision and recall are gonna go up. Then what tends to happen is we add a product that I don't like. So at that point what happens, well my recall stays exactly the same because I haven't recovered any more of the items that I'm interested in. But my precision drops because now I'm looking at a larger world, a larger set of green squares. So my precision goes straight down but my recall stays the same, and you tend to get these very jaggedy looking curves or you get these drops in precision, then these increases in precision and recall. And I'm gonna draw it, it won't be completely accurate here, but it looks kind of like that. Typically something like this would be a precision-recall curve. Okay, so this is an example of a more realistic system. This is another recommender system compared to our optimal. Okay, so now, that we know how to draw these precision-recall curves, we can talk about comparing our different algorithms, how do we know which one is best? Well, we know that we'd like precision and recall both to be as large as possible and what's the best that it can be? We talked about the optimal recommender being that curve. But then when we look at our other curves which are these jaggedy looking things, one doesn't have to strictly dominate another. They might do different things at different points. So one curve is not necessarily, let me actually erase that and do it in a different color so it's a little bit clearer. It's not that one is always gonna be better than the other. Maybe we have another one going here. So in this case, how do I think about comparing these different algorithms, and choosing which one is best? Well like I said, we want precision and recall to be as large as possible but one thing we can measure to compare these is in general, which one is doing better than the other and what's a way to think about that? Well, we can think about the area under the curve. So we can look at for example, all this area under this blue curve. And we can compare that to, for example the area under the green curve, and I can say which area is larger. And that is one proxy for which recommender system is doing a better job. So that's this point here, that a metric we can use is something called area under the curve which measures exactly what I drew below. But you might not care about how the recommender system is doing across all possible performance situations. Instead, you might be in a situation where let's say you have a website and you know based on the real estate of that page, how many items you can display. So maybe you can display ten different items to recommend to the user or you know what the attention span of the users are in general. And you wanna limit how many products you recommend to 20 products or something like this. So in those cases, where you specifically know how many products you're gonna be recommending, you care about what your precision is at that number of products recommended. Because you want that precision to be as large as possible for the constraint of recommending that number of products. And so these are two examples of metrics you might use to compare between different algorithms using this notion of precision and recall. [MUSIC]