And it computes how good or bad the output is with respect to the training data.

The optimization algorithm takes that loss/utility or

utility function and computes a new set of parameters to try.

And we iteratively generate output using the model and parameters,

measure how bad it is, improve the parameters,

until we are satisfied with the parameter values that we have chosen.

As we saw in FunkSVD, this is often through some kind of

a convergence criterion, a threshold on how much the parameters are changing, or

just we're going to train 40 times, 100 times, whatever.

But these pieces work together to let us train the algorithm.

For a simple example, we can look at estimating ratings with a single value.

So we have our scoring function is going to be predict the rating with

a single global value.

So our parameters are the value B.

Our error function we want to minimize the squared error.

So we, the G is going to be the sum over the users and

items of the error, which is the rating minus this baseline value.

Now with this problem, our optimization algorithm is,

look up that statistics tells us that b equals mu.

Now we have the best value.

There's not a lot of sophistication to the training.

But the basic principle is there.

We want to find the value that minimizes the error.

Now this approach can be applied to many different kinds of models.

We can do a bias model, which takes a global bias, mean rating, and

then per user, per item, biases.

And effectively what this gives us is a personalized mean.

How much better or worse are you going to like this item than average,

adjusted by how much better or worse you like items, on average.

And in this case, the parameters are going to be b,

b sub u, for every user, b sub i for every item.

So if we have,