1:37

So he thinks, well, he's got a lot of magic powers,

one good trick deserves another.

So he will convert himself into the Bull Demon King,

which is Princess Iron Fan's husband.

So there's only one problem with this is the Bull Demon King is always riding

his beast, a very unique beast.

And he can't pretend to be two things at once.

So in other words, he's going to have to get this beast from somewhere.

Now, it happens that at this moment,

the Bull Demon King has gone to Moyun Cave for some activities.

And what he typically does when he's there is he goes inside and drinks,

and he leaves his beast unattended outside.

So it's a perfect time for Sun Wu kong to steal the beast and

then he can pretend to be the Bull Demon King.

6:22

All right, so we're talking about penalty methods.

So implicit constraints aren't going to work.

Penalty methods, remember,

are the only general approach that we can use to handle constraints in local search.

So we're treating constraints as penalties.

And of course, we saw before that the problem with this is if we have too high

a penalty, then we get stuck in local minima, we can't move.

And if it's too low then we just never satisfy the constraints.

We just spend our time finding beautiful local minima,

but they don't satisfy the constraints.

So we want to know how we can set these penalties appropriately.

And the way we're going to do this is with what's called Lagrange multipliers.

And the idea is we don't set them, but we're going to learn them as we search.

As we search,

we're going to learn which constraints are the most difficult to satisfy, and

their penalties are going to be set high, and other ones are going to be left alone.

7:10

So in Lagrange multipliers,

we're solving our constrained optimization function here.

And what we're going to do is create a combined objective function.

So this is this lambda function here, which takes both x and

a new class of variables, lambdas.

And really it's just defined as this.

We take the original objective function and

then we have lambda 1 times p1 of x squared.

p1 is a penalty function, and lambda 2 times p2, lambda n times pn.

So each of these pi(x)s is a penalty function for the constraint ci(x).

And this lambda i is what we call the Lagrange multiplier.

And it'll be always non-negative in our case.

And this is basically saying,

how important is it to satisfy this constraint?

And as we find the constraints hard to satisfy,

we'll increase that Lagrange multiplier.

So we'll take more account of those constraints.

7:58

All right, so the penalty function is how we represent constraints.

And we need the penalty function to basically have, if the constraint holds,

then the penalty function gives 0.

And if it's violated, then it gives a positive number, right.

There are examples where we can have penalties with negative numbers, but

that's only for very specific kinds of Lagrange optimization.

So in our case, the penalty is always positive for non-solutions.

Let's look at some choices of penalty functions.

So we've talked about this already.

The obvious thing to do is just say, well, if the constraint holds,

then we get a penalty of 0, and otherwise, we get 1.

So the problem with this is that we get a very rugged landscape.

And we don't really take into account by how much we're violating the constraint.

We don't know if we're close to satisfying the constraint or far away.

So it's better off measuring some degree of violation.

So of course, the penalty is 0 if the constraint holds.

But then the penalty should be some kind of measure of the distance of the current

place I am to the nearest solution of the constraint.

And that means as we move around, if we move closer to a solution,

that's rewarded, and so we're more likely to be able to find a solution.

Gives us a more fine grain view of what we're doing in the search.

9:44

For not equals, we don't have a very expressive way of thinking about it.

So if e takes a value of 0, so it violates the constraint,

then we get a penalty of 1.

Otherwise, we get 0.

And then we can combine these penalty functions.

So if I have a disjunctive expression, c1 or c2(x), then basically,

I take the two penalty functions and take their minimum value.

Because, of course, if I have to satisfy this disjunction,

if I'm close to satisfying one of them,

then that's the distance I'd use to satisfy the whole disjunction.

So the minimum is the right way of combining them.

For conjunction,

I'll take the penalty function by adding up the two penalty functions.

because you can think about it as if I need to satisfy both of them,

now I'm going to have to move at least this far to satisfy the first one and

at least this far to satisfy the second one.

And so I'm that far away, the sum of them, from a solution.

10:45

All right, what about global constraints?

Because, of course, we're used to using global constraints in this course for

everything.

How do we work out penalties for them?

So we can do this by decomposition.

So we can think about alldifferent,

which is this alldifferent is really equivalent to all these inequalities.

And we can just work out the penalties for each of these inequalities.

And that gives us the penalty for alldifferent, so that would be fine.

We worked out the rule for conjunction.

So we would basically add up the penalties for each of these inequalities separately.

Or we can build a direct function.

So in this case, one for

alldifferent is the sum of all possible values that the variables can take.

And you sum up the number of times that value is taken.

And basically, if you take it more than one time, you get some penalty for

that number, right?

So we're adding up the number of times a value of v is used more than once.

11:50

All right, what about for cumulative, right?

We can define a special penalty function for that as well.

And we can think about, well,

how close are we to satisfying a cumulative constraint?

Well, we're not satisfying it when we overuse a resource.

So basically, we can just add up the overuses of resources.

And so if we had this timetable in front of us, we'd say, well, actually,

we've only got this amount of resources.

And obviously, overusing by 1 here in the time between 4 and 5, and

overusing by 1 here between the time of 5 and 6.

So the total overuse is 2, and that would be our penalty.

So I can build a penalty function which does that by basically creating

a timetable and counting the number of overuses.

12:32

So if we think about the constraints in the problem we're trying to solve,

then this absolute value of x not equal to absolute value of y is, basically,

we take 1- absolute of x absolute value of y, and the max of those and

0, which basically gives us, if this thing holds, then we get 0.

If they're equal, then we get 1.

And then we can apply the rules for inequalities to get these two, right,

the rules for an expression less than or equal to 0.

With a little bit of fiddling around, we get this max expression.

And you can see that that agrees with what we saw at the beginning,

where we're just talking about the strength of the thunder.

So if the absolute value of y and added to the absolute value of x is less than 7,

we get 0 because the constraint is satisfied.

If it equals 7, we get 1.

And if it's greater than 7, well, we get 2 in this case because the only place it can

be greater than 7 is when it's 8.

And similarly,

this x times y absolute value greater than 1 gives us this penalty function.

And there's only three possible zones for

the particular values that we can take in our problem.

15:47

But the search space we can see is disconnected.

So when I'm in this search space, by moving either diagonally or

orthogonally, I can't actually reach down to this search space.

All right, so that's one of the reasons we're going to use this penalty method, so

that we can move around the search space freely.

So now here, we've color coded all of the spaces in the search space with

the value of F and which constraints are satisfied.

So the green spots like this one here satisfied all the constraints.

So they're actually places we'd like to end up in.

The yellow spots violate this absolute value of X is not equal to

the absolute value of Y constraint.

The red spots violate this, don't be in the corners constraint.

So these ones here and the blue spots violate the X and

Y absolute value less than 1 constraint, so we can see them in the middle.

And then we've color coded spots, which violate two

constraints by the colors of each of the two constraints they violate.

None of the examples here violate all three.

So for example, this constraint, this spot here violates this constraint here and

this constraint here.

And the function we're trying to optimize is just the original function to

begin with, because we've set all of the Lagrange multipliers to 0.

So if you think about it this Lagrangian function l is exactly just the original

function to begin with.

And so we've put the objective value in all of these slots here and

we've marked all of the local minima in this plot at the moment.

Of course, we don't know this.

We're going to discover this in the actual search.

But if we had a global view of everything that was going on,

then we can actually see all the local minima at the moment with this current

objective function that we're trying to satisfy.

17:37

So let's assume that alpha is 1 and we'll start over here in position 2, 2.

So remember what we're doing.

We're changing one or two variables's values by 1, so

we look at our neighbors and we choose the best downhill move.

We just did that.

This is all fine.

This is just normal search.

Now I'm at this point here, I just look at our neighbors.

Again, I choose the best downhill move.

I get to here.

Now look at all the neighbors, choose the best downhill move.

That gets me here.

Now, I look at all my neighbors and none of them are better.

So we've reached the local minima.

So what do we do now?

Now we look and we say, okay, we've reached the local minima.

Let's look at which constraints are satisfied.

So the first constraint is not satisfied.

There's a penalty of 1.

The second constraint is satisfied and

the third constraint is not satisfied is another penalty of 1.

So, the rule for updating the Lagrange multiplier says,

we're going to add 1 to each of these 2 Lagrange multipliers.

And so that changes this Lagrangian function that we're trying to optimize to

add copies of those penalty function team.

And so that's going to change the landscape.

So actually the landscape that we're trying to search over has changed,

things have moved around.

Once we do that, now we come back to this point here.

We reevaluate our neighbors and

we find that we're not in the local minima anymore.

We've basically pushed the landscape up where we are.

And now, we've got another local minima here and we look at its neighbors.

We go to there.

We look at it's neighbors.

And now, we find a constrained local minima.

So notice it's better than all its neighbors, but

this also doesn't violate any constraints.

So this is a constrained local minima.

It doesn't matter what we do now, changing Lagrange multipliers will not change

the fact that this is a local minimum.

19:26

So what is actually happening in this DLM search?

We're performing a steepest descent search.

But rather on the original objective F, we're using this combined

objective L which combines the constraint violations with the original objective.

And what happens is when we reach a local minima,

then some constrains might be violated.

And really, that's evidence that these constrains are hard to solve at least at

the moment they had to solve.

because we're not solving them.

And so they should be more important.

We should be giving them higher priority to satisfy them and

the way we do this is we update the Lagrange multiplier for

each of these constraints, and we updated by an amount which is proportional to

the amount of their violation multiplied by this multiplier alpha.

So what happens?

How does this stop?

So we stop when we didn't change the point that we're at and

we didn't change the Lagrange multipliers.

So obviously, if we don't change the point we're at,

then we must be the local minima.

In fact, any time we come around that loop, d* is a local minima.

Because we've just run local search.

And the other thing is if lambdas didn't change,

then it means that all the penalty functions must be 0.

If they're not all 0, then we would have updated the lambdas.

And so basically, that means that this whole procedure will terminate when we've

found a constrained local minima which is also called a saddle point.

So we've definitely found a local minimum for

f, which all the constraints are satisfied.

20:53

So let's have a look at DLM search once more in action.

So we're having the same multiplier.

We're just going to start in a new point -4,-4.

So if we start off here, we look at our neighbors.

We move to the nearest neighbor.

We look here and we're at a local minima.

None of our neighbors is better and the first constraint is violated.

So we're going to update the Lagrange multiplier here and

that's going to change the Lagrange function we're optimizing.

Now, that also changes the landscape.

Basically, anything to do with this first constraint has actually been penalized and

that may change where some of the local mineral are going to get had so far.

Then we again, look at our neighbors hoping.

because we've been pushed up that we're now no longer a local minimum.

As it turns out, we're still at our local minimum and it's still the same constraint

that's violated which is unsurprising since we haven't moved.

So we're going to update our Lagrange multiplier again.

We've got a violation of 1, so it gets updated by another 1.

And now, our Lagrangian function is 2 times the penalty of that.

So we're basically saying this constraint is hard to solve,

we should make it more important.

And again, the whole space changes at least the spaces had some yellow in them.

Because their penalty was changed.

Now, we look at our neighbors again and we're still at a local minima.

Update the Lagrange multiplies and update.

Well, this is implicitly what's happening.

We update all of the people, which have a yellow spot on them.

Their penalty phase is going to change.

because basically, we've changed how we're evaluating each of these points.

Of course, in the actual search, we're not going to see this.

We don't care about this.

It's just that we're going to use this function to evaluate points from now on.

We again, evaluate our neighbors and we're still at the local minimum.

We go through the cycle again.

We update our neighbors.

We're still local minimum.

We go through again.

You can see that we're just not getting out of this local minimum,

which was very deep.

We've updated our local minimum.

We're still at local minima after updating seven times now.

So we're now had seven violations we've been sitting here.

And finally, we're not at a local minimum anymore.

So now we can move to somewhere else, thankfully.

We go to there and we find we're at a local minimum.

But at least, we've got a different constraint being violated this time.

So we're violating the second constraint.

And so again, update to Lagrange multiplier for the second constraint.

And obviously, that becomes part of this Lagrange function that we're optimizing

and that's going to change the whole landscape a little bit.

It might change with some of the local minima.

Now we're looking at neighbors and we have a neighbor which is us.

Unfortunately, it's back to where we were before and we keep going.

Of course, we look at our neighbors again and damn it we're back at a local minima.

So back to where we were before and back at a local minima and

we're still violating this first constraint.

23:58

Right, so let's update the Lagrange multipliers again.

So that goes up to eight,

it's going to change the values of positions on this yellow violations place.

And we'll evaluate.

And okay, now we've got a new neighbor with not a local minima anymore,

and we're going to go down to that neighbor that we were at before.

And if we value it, it's a local minimum.

Right, so we're going to go through this process again.

We have the second constraint is violated.

We updated Lagrange multiplier, so that's updating our Lagrangian function.

And that's also going to change all of the values in the red squares.

And then we look at our local neighborhood and we find this is new sources, better.

So we'll move there, and it's a local minimum, again!

All right, we'll update the Lagrange multiplier once more.

And that'll update everything, and we'll look at our neighbors.

And finally we find a place which is not, [LAUGH] Not this person we've been doing,

not these two.

So we've finally moved out of that.

And we find that this also has a better local neighbor, and

now we're at a local minimum.

And this time the difference is that all the constraints are satisfied.

And in fact, we found the global minimum.

So eventually, you can see it took us a long time to get out of this area.

But eventually we got out because the Lagrange multipliers eventually made these

areas not attractive anymore,

even though they looked attractive in the the original objective function.

So let's have a thought about what we've done.

So you can see the different initial points can matter strongly in this

Discrete Lagrange Multiplier method, right?

So it's just like other local search methods where we start changes where

we end up.

And we can only find constrained local minima,

we've got no guarantee that we find the global minima.

So we need to think about combining this with other methods like restart, so that

if we find a constrained local minima, then we can go and look for another one.

And we haven't talked about cycling and plateaus.

So If we're on a plateau which violates a constraint,

we're going to keep making those positions less, and less attractive.

And so eventually, we'll move out of them.

But if we're in a plateau which has no constraint violations,

then those plateaus are local minima, and

we just have all the same problems with plateaus that we've seen before.

But we can combine this with tabu lists as one way of handling plateaus.

So Lagrange multipliers have lots of considerations to use, so

how fast should we update the Lagrange multipliers?

So that's this value of alpha.

How often should we update Lagrange multipliers?

So we have shown an example where we basically only update them at local

minima.

But you can also update them more frequently,

you can update them every step in the local search.

So that's a possibility.

And another question is, how do you split constraints?

So if you split constraints to get more multipliers,

then you have this disadvantage, you've go to keep track of more things.

You've got more constraints that you gotta evaluate, and

more multipliers to keep track of.

But the advantage is you have a less rugged search space.

So if I have a single multiplier for alldifferent,

then if I keep on violating alldifferent, then it seems like it's very important,

and it's multiplier would get bigger, and bigger.

But it might be that I was violating alldifferent because of some inequalities

early on in the search, and later on, I'm violating for different inequalities.

And so by having a multiplier for each inequality,

I've got a finer-grained tracking of which constraints are difficult to solve.

Another thing is, should we reduce multipliers, ever?

Typically we end up doing this just for, at least for arithmetic stability.

And another reason is maybe constraints which were really important and

hard to solve early on in the search space have now become easy, and

we don't really have to worry about them.

And actually, they just make it harder to find the constraint local

minima that we're looking for.

Right, in summary,

thIs Discrete Lagrange Multiplier method is a principled way to adjust penalties.

So this is really important because we've already seen that this

penalty-based approach is really the only final way that a local search method

can handle constraints, arbitrary constraints.

And so this DLM search will find a constrained local minima,

given an infinite amount of time.

But it'll run around, and eventually it will do that.

It's still not a complete answer, right?

We need to know how fast to adjust the multipliers?

We need to think about how many multipliers we need, or

how do we think about multipliers for our constraints?

When to adjust the multipliers?

And when we should possibly make them get smaller?

And we can combine them with other methods to increase exploration and avoid cycling.