0:00

>> In this module, we are going to talk about nonlinear optimization.

We will review unconstrained optimization, and then briefly talk about constrained

optimization and in particular, Lagrangian relaxation and its two applications.

One on some utility maximization problem and another one on a mean-variance

portfolio selection problem. Let's start with unconstrained nonlinear

optimization. What does unconstrained mean?

It means that I'm allowed to choose any vector that I like to try to minimize a

function. To keep everything general, we will assume

that I've got a function f of x, and what is x?

X is a vector, in Rn. It's a vector with n different components.

It's a unconstrained problem so I can minimize my f of x, the function, over the

entire space. It turns out that for non-linear

optimization, we have to differentiate between two kinds of minimum problems.

Ordinarily, we just want to find a vector x which takes the minimum possible value.

So, here's a picture. I've got a one-dimensional function.

Here's my x, here's my function f of x. Now, when I'm trying to minimize this

function f of x, all I need is this point. This is my point x star the minimum.

It's the global minimum. It's the best point that I want in the

entire real line. But as we'll go through this lecture,

note, we will see that, when it comes down to nonlinear programming, we try to find

these points by looking at derivatives. By taking first derivative, which will

give me the gradient, or the second derivative, which will give me the Hessian

matrix. Derivatives, unfortunately, can look only

in the neighborhood of a point. They can't go look far away, because

derivatives are defined by taking limits of points coming arbitrarily close.

So, in order to be able to work with derivatives, we have to have another

notion, which is that of a local minimum. So, this point here is a local minimum.

Why is it a local minimum? Because if I just go in a neighborhood of

this point, if I look for points that live in this little interval around here, then

for this little local neighborhood, this point is actually optimum.

It is true that if I leave neighborhood I get other points that are better, x star

over here is a global optimum point. And therefore, that's better than this

local optimum point or at least as good as the local optimum point.

But within this interval, I can't get any other point.

So, for nonlinear optimization, we have this notion, two different notion.

Global optimum point, that's where we want to get to.

Local optimum point, which are locally optimum, that's what we can get because we

use derivatives. So now, we are going to describe what are

the conditions that we need in order to get a local minimum.

Remember, local means only in the neighborhood.

It means that these criteria are given by taking derivatives.

A point is a local minimum if the gradient at that point is going to be 0, meaning

that, what's a gradient? It's a vector, remember this function?

Is a function from Rn, meaning that it's got a vector x, which has n components.

Therefore, the gradient is simply the partial derivatives.

I take the function, I take the partial derivative with respect to the first

component, the partial derivative with respect to the second component, partial

derivative with respect to the nth component and stack it up as a vector.

When I say that gradient is equal to 0, I mean that every component is equal to 0.

Here's a simple example. Let's take F of x is equal to 2x squared

x1 squared plus 3x2. So, the partial of f with respect to the

first component, is going to be 4x1. The partial of f with respect to the

second component is going to be equal to 3, because the x2 goes away.

So, the gradient is going to be 4x1 and 3. So, when is it equal to 0?

That means, that x1 must be equal to 0. So, for any local minimum, it must be the

case that x1 is equal to 0. But that's not sufficient.

We have to take a matrix of second derivatives, and that matrix must be

positive, semi-definite at any local minimum, and positive definite, if you

want to be sure, that, that point is a local minimum.

How do I construct this matrix? I take the partial derivatives and stack

them up as a matrix. First component, remember, for any matrix,

this is going to be the one, one component.

So, I take the partial derivative with respect to x1 and take another partial

derivative with respect to x1. This point here is 1, 2, so I take the

partial derivative with respect to x1 and then a partial derivative with the respect

to x2. Just a, just to recall that if a function

is twice differentiable, meaning you can take two derivatives then it doesn't

matter the order in which you take the derivative.

Whether you first take it with respect to x1 and then you take it with respect to x2

or vice versa. And we'll see an example in a moment.

You stack them up on the matrix, that matrix must have all non-negative

eigenvalues. We won't get into the detail of

eigenvalues because we don't really explicitly need it.

They can be easily computed in MATLAB. You can just give a command called eig, it

will tell you the eigenvalues of the matrix.

If all the Eigenvalues are non-negative, then it'll turn out that it's a local

minimum. If it turns out that all the eigenvalues

are strictly positive, then it's definitely a local minimum.

Now, functions that are going to be useful are called convex functions and when we

use these functions, we are going to remind you in the course.

But the picture that I have, I want you to keep in mind is the function is convex if

it looks like this. I take any two point and I draw the

straight line joining those two points and the function.

The straight line lies above the function. So, this is a convex function.

Here's a function that is not a convex function because if I take these two

points and join them, that's not lying above the function.

So, convex functions are particularly nice if I'm trying to minimize a convex

function that I don't even have to check the Hessian.

I just get the gradient, set it equal to 0 and I'm done.

We'll walk you through to two examples of what is going on here.

So, first example is unconstrained nonlinear optimization.

Here's the problem that I want to solve. I want to minimize over x in our two, two

components, this function. X1 squared plus 3x1 x2 plus x2 cubed.

I first take the derivative and set it equal to 0.

I get 2x1 plus 3x2 as the partial derivative with respect to x1.

I get 3x1 plus 3x squared as the partial derivative with respect to x2.

I set it equal to zero, and I solve it. The first equation tells me how x1 and x2

are related. I plug that into the second equation.

That gives me a quadratic, which has two roots.

It turns out that the two roots are x equal to 0 or x equal to minus 9 over 4,

and 3 over 2. So, the first component is minus 9 over 4.

The second component is 3 over 2. If I take the partial derivative, the

second partial derivative, I take this one and take its partial derivative with

respect to x1, I get 2. I take the same and I take the partial

derivative with respect to x2, I get 3. For this one, if I take the partial

derivative with respect to x1, I get 3 but I should have known that already because

it doesn't matter the order in which I take derivatives.

So, the off diagonal-terms are always the same.

If I take this,[UNKNOWN] take the second derivative with respect to x2, I get 3

times 2, 6x2. If I plug x equals to 0, which means x2 is

equal to 0, I get this matrix. That's not positive definite, so it's not

a local minimum. It has actually one positive eigenvalue

and one negative eigenvalue. If I put x equal to minus 9 over 4, 3 over

2, it turns out that this matrix is positive semi-definite, meaning it has

non-negative eigenvalues and, in fact, it's positive definite, meaning that it

has all positive eigenvalues, and therefore, I know that this is a local

minimum. That's all that we need to know as far as

this course goes about unconstrained nonlinear optimization.

Take the gradients that are equal to 0, figure out what's happened to the Hessian,

or the second-order matrix. Next, we want to take this idea and apply

to constraint problems. So, here's a constraint problem.

A typical problem like this will show up in a utility maximization problem.

I have two different things that I could do.

I could either invest in 1s, one particular business or another particular

business and I have a total wealth of 12. So, x1 plus x2 equals 12.

If I put money into the first business, I get 2 times log 1 plus x1 as my return.

If I put it in the other business, I get 4 times log 1 plus x2 as the second return.

Now, log, logs are going to have diminishing returns so eventually, even

though I put money in the second one, which gives me incrementally the better

return, I'll end up getting lesser and lesser and so at some point, the first

project will become more competitive. Now, the constraint tells me the total

amount of money that I have is just 12. If this, if there were no constraints, I

would solve this problem easily. But this constraint makes my problem

harder. It's a convex problem because log is a

concave function and trying to maximize a concave function with respect to some

linear constraints. And this is a convex problem so in theory,

this is easy. So, in order to get rid of the

constraints, what we do is we multiply it by a variable and add it to the objective.

So, I've got 2 log 1 plus x1, 4 log 1 plus x2.

This is just the objective. Here is my multiplier and it's, it has a

mean which is sometimes called the Lagrange multiplier because Lagrange was

the first, first mathematician who came up with using this idea.

Try take the Lagrange multiplier, v, multiply it to the constraints x1 plus x2

minus 12, what's happened to the minus 12, I moved the 12 on to the other side that

becomes my constraint and subtract it from the objective.

Now, I throw away the constraints. Remember, we had done something very

similar to this in the module linear optimization.

We dualized the constraints, multiplied them by a quantity that is a multiplier

which had to have some sort of particular sign.

In this particular case, I threw away the signs as well.

We can be anything and then I threw away the constraints, I relaxed and it just

optimized the objective, in order to get the dual linear problem.

I'm doing the same thing here. I've got a nonlinear objective, I'm

subtracting a multiplier times the constraint and then, I'm going to throw

away the constraints and pretend that this is an unconstrained problem.

In order, I know that this problem is convex so in order to get the optimum

point, all I have to do is find the gradient and find its solution.

So, I take the gradient of this Lagrangian function.

From here, I get 2 over 1 plus x1. From here, I get minus v.

That's the partial derivative with respect to x1.

Partial derivative with respect to x2, I get 4 divided by 1 plus x2 minus v, from

this one, and that is equal to 0. If I solve this, I get x 1 is equal to 2

or v minus 1, x2 is equal to 4 over v minus 1.

How do I get this v? I know that the optimal solution, x1 plus

x2, must equal 12, so I plug it in to the equations, and I end up getting an

equation that says, 6 over v must equal 14, or v must equal 3 over 7.

Once I know the value of v, I plug it back into the expression for x, I get x is

equal to 11 over 3 and 25 over 3. So intuitively, it makes sense, you're

going to invest more in the second, second opportunity, because it has a higher

return. But as you scale up that return, as you

scale up your investment, you have diminishing return, so at some point, it

becomes profitable to go to the first one, and the correct balance is in the ratio 11

is to 25. And look how easily we were able to solve

for this problem by including a Lagrange multiplier, taking the gradients, and then

solving it to find what x1 and x2 is. The last piece of this module will show us

how to apply this for a particular problem, which comes up over and over

again in financial engineering, which is portfolio selection.

Now, what's going on here, I've got a bunch of assets, so my x belongs to R to

the n. It has n different assets, so it's a

vector with n components, what does this constraint say?

If I take the vector of all ones and multiply it to x, I get 1.

This multiplication just gives me the sum of j going from 1 through n of xj.

So, if I add up all the components of my portfolio, I have 1 dollar to invest.

And what do I want to do? I want to choose a portfolio that

maximizes my return minus a risk aversion parameter lambda times the variance.

This is the return or the mean return on my portfolio, v transpose xv, gives me the

variance of my portfolio. And I want to do the portfolio x, which

sums to 1, and which maximize the, the risk adjusted return.

Again, the constraints makes this problem harder, so I'm going to take the

constraints, multiply it by a Lagrange multiplier, and put it into the objective.

Here's my objective, mu transpose x minus lambda times x transpose vx minus v times

1 transpose x minus 1. Again, a convex problem, it has no

constraints, I'm going to take the derivative and set it equal to 0.

If I take the gradient with respect to x, I get mu from here.

Just to make this thing a little bit clearer, let me get rid of this.

This mu is coming from the gradient of that quantity.

Minus 2 lambda vx is coming from the gradient of the second quantity.

V times 1 is coming from the gradient of this quantity.

I set it equal to 0. Now, I solve for x.

What is x? X is 1 over 2 lambda v inverse mu minus

v1. What do I do?

I take this minus 2 lambda vx term onto the other side so this minus becomes plus.

And I take the 1 over 2 lambda, divided, take the inverse of the v, multiply it

onto the other side, I get x. So now again, like in the simple example

that we saw before, I now have x in terms of v.

How do I get the v? Plug it into the constraints.

1 transpose x, must equal 1. I put the equation here, I get 1 transpose

v inverse mu minus v1 must equal 2 lambda, v is just a scalar.

I rearrange terms, I get v is equal to 1 transpose v inverse mu minus 2 lambda

divided by 1 transpose v inverse 1. I know the value of v now, I can plug it,

plug that value v here, and get expressions for x.

And again, this complicated portfolio selection problem has been solved by just

taking Lagrange multipliers. We will use this in the course to show

how, by using this technique, we can generate efficient frontiers, we can

characterize what the shape of these frontiers are, we can say that any point

of this efficient frontier can be generated by a small set of mutual funds

and that, ultimately, will lead to the capital asset pricing model, which is a

very important methodology for pricing assets in the market.