0:03

Hi, in this part of the capstone project you'll add a new kind of recommendation

that will help make recommendations better for you than using simple averages.

These recommendations will use information about your own ratings, or for

any rater given their ratings.

0:18

Average ratings may be flawed.

Let's look at these ratings for four movies.

Mission Impossible, the Martian, Pitch Perfect 2, and

Star Wars: The Force Awakens.

Using average ratings for the movies as is shown here treats each rater equally.

But for recommendations for me, Morgan's evaluations might be better than Jessie

because I might be closer to Morgan in terms of movies I like.

For you Sam's ratings might be better for

creating recommendations this is the idea behind a different kind of average or

recommendation when called collaborative filtering.

The idea is to create recommendations specific to a user or

rater rather than the same recommendations for all users.

To do this your weight raters differently

valuing those raters who are more like you in calculating averages.

To create collaborative recommendations you need to make a few modifications to

the averaging method you've already written, you'll need to find raters

more like you and use their ratings then you here will be

a parameter to the method used to create recommendations by using rated averages.

For example, Chris and I may both dislike the movie Lucy.

And both like the movie Rain Man.

So Chris is close to me in some way because we have similar taste in movies.

On the other hand, Sam might like Lucy when I didn't.

Sam didn't like Rain Man but I did.

So Sam's rating should carry less weight than Chris's ratings because Sam and

I don't see things in quite the same way that Chris and I do.

So I'll value Chris's ratings more than Sam's in creating a new

weighted average for getting recommendations.

If Chris likes the movie A Beautiful Mind and

I haven't seen then it might be time to think about watching it.

This is the general idea behind the new program you'll be writing.

2:23

There are two conceptual changes to the code you've already written that makes

recommendations based on averaging all user ratings.

In the table below, our ratings are from three raters, Chris, Sam, and Morgan.

2:37

The first changes to only use ratings from raters close to me.

Or to the person for whom recommendations are being made.

The number of close raters is a parameter.

So you might use N equal to ten to use ten close raters.

The second change is the weight rating by a measure of how close a rater is to me,

or to the person getting recommendations.

Let's look at this idea in more detail.

3:02

Which of these movies has the highest average?

So it's the most recommended movie from me.

The Fly has an average rating of seven from the two raters Chris and

Morgan who rated it.

Spider-Man has an average rating of six.

The Butterfly Effect has an average rating of seven and

Beetlejuice has an average rating of 7.5.

Given these averages, I should watch Beetlejuice.

It has the highest average rating.

3:42

Let's look more closely at calculating the weighted averages.

We'll use the closeness weight for each rater in creating averages for

the movie ratings.

As you can see in the table, Chris' weight is 20, Sam's is ten, and Morgan's is five.

We'll show how to calculate these weights next.

For now we'll use the weights in creating average recommendations.

We'll multiply each rating by this closeness weight.

And calculating averages.

Not every movie will get a rating from each rater.

We'll use the weighted averages to get a recommendation specific to me, or

to any rater whose closest ratings are used in calculating averages.

4:20

In calculating an average for The Fly, we multiply eight by 20,

since Chris's weight is 20, and Chris's rating is eight.

Sam didn't rate the movie, so no value from Sam.

For Morgan, we get Morgan's weight of five,times Morgan's rating of six,

to get 30.

This gives a weighted average of 95.

That's different from the unweighted average of seven.

4:41

Spider Man's weighted average is 66.67 after multiplying each rating

by the rater's corresponding weight.

The Butterfly Effect has a weighted average of 83.3.

And Beetlejuice has a weighted average of 60.

Given these weighted averages, it looks like we should watch The Fly.

Note that the best movie used an unweighted average is Beetlejuice.

And this is the lowest rated movie using weighted averages.

5:08

To calculate this weighted average, we need to calculate a weight.

How close a rater is to me, or to some particular rater.

We'll represent each rater by a vector of ratings to discuss how to calculate

closeness.

The vector's conceptually just a list of ratings for each movie.

5:27

For example, here are seven ratings by a rater named Sam.

To help with this explanation,

we're including movies that are not rated by Sam.

These are represented by zeros.

6:11

Sam rates this movie a five, I rate it a six.

The product is 30.

The next movie we both rate has a seven by Sam and a four by me,

so the product is 28.

For this movie Sam gives an 8, I give a 4, the product is 32.

The last movie we both rate wasn't liked by Sam, who gave it a one.

I gave it a six.

The product is six.

So the similarity weight between Sam and me is the sum of these values,

30 + 28 + 32 + 6, which is 96.

The weighted similarity for Chris and me is calculated the same way.

We have three movies we both rated.

We calculate the sum of 12+42+54, which is 108.

So I'm closer to Sam than I am to Chris since the weight is a measure

of closeness.

This calculation is actually a dot product,

a measure of mathematical closeness in a vector space.

It's good to know there's a mathematical foundation for

how we're calculating weighted averages.

In this case,

we simply calculate the sum of the product of each movie two raters rate in common.

7:18

In our actual calculations, we'll need to adjust the ratings to adjust the scale of

one to ten, where a rating of one means really really don't like a movie, and

a ten means really really like a movie.

We want the ratings that are on a scale of 1-10 to work when we determine closeness

by calculating dot products.

We want raters who are close to rate movies similarly.

Both like or both dislike for

example since this closeness is a measure of similarity.

If we simply multiply ratings how do two raters that rate a movie with a one and

a two compare to those who give ratings of eight and nine?

If we multiply we'd compare two to 72 a huge difference, but

these raters have very similar taste.

8:03

Giving a one and a two is the same as giving an eight and

a nine in terms of similarity.

Both raters really dislike a movie with a one and a two, and

both raters really like a movie with an eight and nine.

These pairs of ratings should contribute equally to a measure of similarity

but they don’t.

We’ll center the ratings by subtracting the middle rating of five from each one.

So rather than using one and two, we’ll use (1-5) and (2-5) or -4 and -3.

For the ratings of 8 and 9 we'll use 8 -5 and 9- 5 or 3 and 4.

We get a product of 12 for both centered ratings, and

thus we get that the ratings are equally similar.

In this example, we'll show centered ratings by subtracting five from each one.

Ratings that were originally zero are shown with an asterisk.

We won't use those in calculating a similarity score.

For example, here are seven ratings from Sam showing centered with their original

non centered ratings.

You see that zeroes in the original are represented by

asterisk in the centered ratings.

Remember that in the programs you write, ratings are stored only for

those movies actually rated by Sam.

9:15

My seven element vector shows I rated six movies.

Let's look at how these vectors are used to calculate a similarity weight.

We'll walk through the calculation for seeing how close I am to Sam.

We'll multiply the ratings for each movie Sam and I both rate.

Sam rates this movie as zero, I rate it a one, the product is zero.

The next movie we both rate has a two by Sam a minus one by me so

the product is minus two.

For this movie Sam gives it a three I gave it a minus one the product is minus three.

9:46

The last movie we both rate wasn't liked by Sam who gave it a minus 4 I gave

it a one the product is minus four.

So the similar weight between Sam and me is the sum of these values,

zero plus minus two plus minus three plus minus four, which is minus nine.

The way that similarity for Chris and me is calculated the same way.

We have three movies we both rated.

We calculate the sum of minus three plus 12 plus four, which is three, so

I'm closer to Chris than I am to Sam, since a rate is a measure of closeness.

Remember that in the original non-centered ratings I was closer to Sam.

So this makes a difference.

You can see in the ratings that Sam and I don't really agree.

When I like a movie, Sam doesn't.

And vice versa since all the products are negative.

10:45

Let's look at the Java code for calculating these weighted similarities.

To find the raters, or near to any rater, we'll call the method get similarities,

that you'll complete for this capstone.

The parameter ID is the rater for whom similarity ratings will be calculated.

A class, RaterDatabase, will supply access to each rater, given a rater's ID.

This class is similar to the MovieDatabase class you've already used.

11:35

Before we turning the array list, the code will sort the list so that

first rating is that of the rater with the highest rate, the one closest to me.

We can do that by calling collections.sort and

passing the comparator that's part of the job that you took collection class.

This comparative reverses the order of the rating compare to function so

that list will store highest values first.

12:01

Once you've got the weights for each rater,

you will be able to calculate a weighted average to get recommendations.

This method is similar to getAverages, but

this one particular rater whose ID is a parameter.

First the weights for all raters are calculated by calling the method

getSimilarities we've just discussed.

As with the getAverages method, this code loops over raters,

in the getAverages method the loop was over all raters and we checked to see

if each rater had rated the movie whose average was being calculated.

12:33

Here we loop over just those radars who are close to me.

Those for whom weights are stored in the array list named list.

We use only the first numerators entries in list.

Where numerators is a parameter.

The idea is to use just the top ten or 20 or 100 Raters who are closest to me.

12:53

You'll need to be careful in ensuring there is no bad indexing and

getting ratings from list.

After accumulating a weighted sum the weighted average will be

added to the array list being returned.

Just like the code and the unweighted getAverage method.

You will return the list of movie ratings.

You may want to sort it first.

13:39

These ratings might help me calibrate the results.

Since I liked these movies, seeing them in the list of the top 15 makes sense to me.

Even though I don't need a recommendation to see them, since I've already seen them.

Should we print more than the top 15?

Should we print all the recommendations?

Should we print the weighted average?