It turns out that, that's almost a good, good idea, except you need something else.

We often encounter was called overfitting. This is in the textbook in the advance

material. But basically, it says, that you can add

so much parameters, all these bu's and bi's, and you can find tune them so well

based on the ground truth and the training data that it loses flexibility and

predictive power of the actual prediction. So, we want to somehow minimize the impact

of the parameters. We don't want to rely too much on

hindsight perfection, We want to leave some room for foresight cuz it's

ultimately foresight that matters in predicting or recommending.

One way to continue the weight of the parameter is to actually minimize the

size, say, l2 norm of these bu's formed as a vector and these bi's formed as a

vector. That's called the Regularized Least

Squares. If we use quadratic norm, then l2 norm is

still a least squares, it's just a bigger least squares problem.

But, it does try to contain to much reliance on too refined parameters based

on hindsight. So, after the baseline predictor, what we

have is the following. We got the actor rui, we've got the

baseline predictor r hat ui which is the same as rui minus r bar, the lazy average

plus the optimized bu and optimized bi, based on solving the least squares.

And we call this difference a shifted r, called r tilde ui, just a shorthand

notation. In other words, we have removed in this r

hat r tilde ui, the impact of the average and the impact of the per user bias, and

per movie bias. And we're going to take this as the

starting point into our neighborhood predictor to look at similarities across

movies and across users. But before we do that, just a very quick

detour for five minutes on the notion of convex optimization.

Now, we've seen linear programming which is a special case of convex optimization

where we minimize linear objective functions over linear constraints, linear

equations and inequalities. So, pictorially, that means, you use

linear in, inequalities to cut out the space of the variables into something

called the polytope with sharp corners. And then, you slide a bar, which is a

straight line cuz it's a linear function representing your objective function

across this constraint set. So, imagine an airplan, airplane flying

above this constraint set, okay? And we want to identify the minimal value

of this objective function over this kind of constraint set.

It turns out, what's really important is that you want your objective function to

be convex, like a convex quadratic function, such as x square over this space

of constraint set that is also convex like an oval or circle.

It's convexity that determines the watershed between easy and hard

optimization problems. So, what is convex optimization?

It refers to minimizing a convex objective function.

So, we've got to define what is a convex function, subject to a convex constraint

set. And, to be computationally useful, this

must be a convex constraint set that can be easily represented by some upper bound

inequalities on convex functions. So, we have to define what's a convex set

and what's a convex function. We will encounter other forms of convex

optimization in network economics, in internet pricing, in congestion control,

in routing, and so many other topics. This is our first brief, quick

introduction to this subject. And, we'll see in this and future lectures

that convex optimization is easy in both theory and in practice.

But, we just want to take three more minutes to introduce the basic notation

before we bring them back, just in time for the future applications in later

lectures. So, what's the convex set?

What's the convex function? We can actually go into a whole course on

nothing but convex analysis, and there are such courses in Math Department.

But for today's purpose, we just need to specify with some pictures.

A convex set is basically a set where you can pick any two element in the set, and

draw a line segment in between. And entire line segment also lies in the

convex, in the set. Then, we call this a convex set.

Say, well, this sounds like all sets of convex.

Well, no. Some sets, for example, like this set, you

can pick two points in the set but part of their line segment, this part, does not

lie inside the set. And, this is a kidney-shaped thing that is

a non-convex set. So, generally speaking, we say, a set is

convex if we pick any two points, a and b, that in this set, okay?

C, pick c. Then, the entire line segment between A

and B, which can be represented as theta a plus one minus theta times b.

For some theta between zero and one, for example, when theta is half, then it is

the midpoint between a and b also lies in c.

For all such theta, that's the key word, not for some theta but for all theta.

This is the representation mathematically of the line segment between point a and b.

And, by the way, say a, b are vectors in general, in, in dimensional space So,

pictorially, it just says that, a convex set is a set where the whole line segment

lies inside. It turns out the most important geometric

property of convex set is that any two convex set that don't touch each other can

be separated by a straight line. So, that one set lies on the one side and

the other lies on the other side. Let's say, will this be true for

non-convex set too? Not really.

For example, one set is convex, the other set is non-convex.

They don't touch each other and yet, you cannot separate them with a straight line.

Of course, you can separate them one way or another cuz they don't touch.

But, you can't separate them with a straight line.

It turns out that this is the most useful property of convex set, but we don't have

time to go through the implication of that in this lecture.

Now, having specified what's a convex set, we define what's a convex function.

A convex function intuitively is a function that curves upward, okay?