Those same matrices will minimize the root mean squared error and

sum of squared error is a much easier formula to work with.

So we're going to minimize the sum of the squared error.

Now if you recall our prediction rule.

The score of item i for a particular user,

which I'm going to abbreviate here as r tilde ui,

to say the prediction of the rating,

is our baseline value plus the sum or the dot product of the user and

item feature vectors for that user in that item.

So we compute the error by subtracting the prediction from

the rating which is r sub ui- b sub ui minus this dot product.

But then the updates, gradient descent is based on the rule, That theta,

so we're going to, it's common to call the parameters of one of these models theta.

So theta is P and Q together, the two matrices.

That theta at step n equals

theta at step n- 1 plus

the gradient of our error

with respect of theta.

And the gradient is just this big matrix of partial derivatives.

So we're trying to care,

though we're training individual user item feature values that are timed.

So we've got a particular rating, r sub ui.

It has a particular item.

It has a particular user, a particular item.

We're also training one feature at a time.

So we train the first feature.

Then we train the second feature.

So we're training at feature f.

So we're trying to update P sub uf, and Q sub if.

The user and item feature values for that particular feature.

So we really only have two values we care about at any step.

We're using something called stochastic gradient descent in FunkSVD,

which means we're updating for every rating.

Rather than going over all the ratings and computing a big update matrix.

We're just automatically upgrading with every rating.

So at any given point, any given step through the algorithm,

we care about these two values.

We're trying to update these two values.

So how do we do that?

We want to take the derivative.

The derivative with respect

to Puf of the squared error,

or of epsilon sub ui squared.

So if you've taken calculus and

remember your derivative rules for

dealing with powers, that's going to

be equal to 2 times epsilon sub ui,

times the derivative with respect

to P sub uf of epsilon sub ui.