So it's a very, very long and

sparse vector that counts the number of words that we see in this document.

Okay, so we talked about this representation of our documents in terms

of just these raw word counts.

This bag of words model.

And now we want to talk about how we're gonna measure the similarity between

different documents because we're gonna use that in order to find

documents that are related to one another and so on, like we talked about before.

Carlos is reading an article, so what’s another article he might be interested in?

Okay, so imagine that this is the count factor that we have for

this article on soccer, with this famous Argentinian player, Messi.

And then there's another article here that I'm showing in blue and

the associated word counts.

And this article is about another famous soccer player, Pele.

Is that right?

>> Pele. >> Pele.

[LAUGH] So when we think about measuring similarity,

what we can do is simply look at an element-wise product over this vector.

So for every element in the vector, we're gonna

multiply the two elements appearing in these two different count vectors.

And add up over all the different elements in this vector.

So here I've done this math where we have 1 times 3,

all the other elements multiplied to 0,

except at some point that fifth entry in the vector we have 5 times 2.

And if we do this multiplication over the whole vector,

the sum of these terms is 13.

So that measures the similarity between these two articles on soccer.