[SOUND] So I showed you how we rewrite the query like holder which is a function into a form that looks like the formula of this slide after if we make the assumption about the smoothing, the language model based on the collection language model. Now if you look at this rewriting, it will actually give us two benefits. The first benefit is it helps us better understand this ranking function. In particular, we're going to show that from this formula we can see smoothing with the collection language model would give us something like a TF-IDF weighting and length normalization. The second benefit is that it also allows us to compute the query like holder more efficiently. In particular we see that the main part of the formula is a sum over the match of the query terms. So this is much better than if we take a sum over all the words. After we smooth the document the damage model we essentially have non zero problem for all the words. So this new form of the formula is much easier to score or to compute. It's also interesting to note that the last term here is actually independent of the document. Since our goal is to rank the documents for the same query we can ignore this term for ranking. Because it's going to be the same for all the documents. Ignoring it wouldn't affect the order of the documents. Inside the sum, we also see that each matched query term would contribute a weight. And this weight actually is very interesting because it looks like a TF-IDF weighting. First we can already see it has a frequency of the word in a query just like in the vector space model. When we take a thought product, we see the word frequency in the query to show up in such a sum. And so naturally this part would correspond between the vector element from the documented vector. And here indeed we can see it actually encodes a weight that has similar in factor to TF-IDF weight. I'll let you examine it, can you see it? Can you see which part is capturing TF? And which part is a capturing IDF weighting? So if want you can pause the video to think more about it. So have you noticed that this P sub seen is related to the term frequency in the sense that if a word occurs very frequently in the document, then the s made through probability here will tend to be larger. So this means this term is really doing something like a TF weight. Now have you also noticed that this term in the denominator is actually achieving the factor of IDF? Why, because this is the popularity of the term in a collection. But it's in the denominator, so if the probability in the collection is larger then the weight is actually smaller. And this means a popular term. We actually have a smaller weight and this is precisely what IDF weighting is doing. Only that we now have a different form of TF and IDF. Remember IDF has a logarithm of documented frequency. But here we have something different. But intuitively it achieves a similar effect. Interestingly, we also have something related to the length of libation. Again, can you see which factor is related to the document length in this formula? What I just say is that this term is related to IDF weighting. This collection probability, but it turns out that this term here is actually related to document length normalization. In particular, F of sub d might be related to document length. So it encodes how much probability mass we want to give to unseen worlds. How much smoothing do we want to do? Intuitively, if a document is long, then we need to do less smoothing because we can assume that data is large enough. We probably have observed all the words that the author could have written. But if the document is short then r of sub t could be expected to be large. We need to do more smoothing. It's likey there are words that have not been written yet by the author. So this term appears to paralyze the non document in that other sub D would tend to be longer than or larger than for a long document. But note that alpha sub d also occurs here and so this may not actually be necessary paralyzing long documents. The effect is not so clear yet. But as we will see later, when we consider some specific smoothing methods, it turns out that they do paralyze long documents. Just like in TF-IDF weighting and document length normalization formula in the vector space model. So, that's a very interesting observation because it means we don't even have to think about the specific way of doing smoothing. We just need to assume that if we smooth with this collection memory model, then we would have a formula that looks like TF-IDF weighting and documents length violation. What's also interesting that we have very fixed form of the ranking function. And see we have not heuristically put a logarithm here. In fact, you can think about why we would have a logarithm here. You look at the assumptions that we have made, it would be clear it's because we have used a logarithm of query like for scoring. And we turned the product into a sum of logarithm of probability, and that's why we have this logarithm. Note that if only want to heuristically implement a TF weighting and IDF weighting, we don't necessary have to have a logarithm here. Imagine if we drop this logarithm, we would still have TF and IDF weighting. But what's nice with problem risk modeling is that we are automatically given the logarithm function here. And that's basically a fixed form of the formula that we did not really have to heuristically design, and in this case if you try to drop the logarithm the model probably won't work as well as if you keep the logarithm. So a nice property of problem risk modeling is that by following some assumptions and the probability rules we'll get a formula automatically. And the formula would have a particular form like in this case. And if we heuristically design the formula we may not necessarily end up having such a specific formula. So to summarize, we talked about the need for smoothing the document imaging model. Otherwise it would give zero probability for unseen words in the document, and that's not good for storing a query with such an unseen word. It's also necessary, in general, to improve the accuracy of estimating the model represent the topic of this document. The general idea of smoothing in retrieval is to use the connecting memory model to, to give us some clue about which unseen words should have a higher probability. That is, the probability of an unseen word is assumed to be proportional to its probability in the collection. With this assumption, we've shown that we can derive a general ranking formula for query likelihood that has effect of TF-IDF weighting and document length normalization. We also see that, through some rewriting, the scoring of such a ranking function is primarily based on sum of weights on matched query terms, just like in the vector space model. But, the actual ranking function is given us automatically by the probability rules and assumptions that we have made. And like in the vector space model where we have to heuristically think about the form of the function. However, we still need to address the question how exactly we should smooth the document and the model. How exactly we should use the reference and model based on the connection to adjust the probability of the maximum micro is made of and this is the topic of the next batch. [MUSIC]