We previously defined the intuition for the dual decomposition algorithm which

takes an optimization objective and decomposes it into a bunch of different

slaves. And uses the communication between the

slaves to try and get them to agree with each other about the overall map

assignment. So now, let's take this intuition and

convert it into an actual algorithm. So, what we have here is each slave gets

its own pet objective function. So here is the objective function for the

ith slave, for the slave corresponding to the variable xi.

And here is the objective function that the slave corresponding to the larger set

F is trying to optimize. Now, each of these initially has, as a

component, its own local objective function that is assigned to it.

But in addition, we're going to introduce for each of those a set of terms,

these terms, that modify its objective function so as to try to get it to agree

with with the other slaves. Specifically, we're going to try and get

the x, the i slave to agree with the factors F,

in which i appears. And conversely, we're going to try and get the F slave to agree

with the slaves of the variables in its scope.

So, let's see how that gets done. We are going to initialize all of these

incentive terms, all of these lamdas, to be zero.

Initially, each slave makes its decision based purely on its own local piece of

the objective. But then, as the algorithm evolves, these

incentives start to change. So, how does that happen?

At each iteration t, each slave optimizes its current

objective. So, the i slave optimizes theta i hat

lambda, where lambda is the current is the

current set of the penalties. The current set of penalties.

And that gives it an assignment which we're going to call xi star.

The F slave does the exact same thing. It optimizes its own local objective,

which is theta hat F. Also using the same set of lambdas.

And that gives it x hat sorry. x star of F, which is this guy over here.

Now, let's see what happens if there's a disagreement between a large slave F, and

a variable i in its scope. What we would like to do is we would like

to incentivize them to agree with each other.

And so, what we do is that for all F, and all variables i in F, if they

disagree. That is, if the chosen variable for xi in

the assignment chosen by the factor F is different from the assignment of i chosen

by the i slave, then we need to change the incentive

function. And we do that by basically changing the, the, the lambdas.

And so, specifically we see that here when we look at the penalty for the pair,

Fi, which is over here.

We decrease alpha t, we'd decrease by a factor alpha t, which is greater than

zero. the preference for xi star,

which is the value chosen by by the i slave.

And if we see that, if we look at that, we see that, that is going to

disinsentivize] the i slave from picking the value that it picked.

Conversely, we increase the preference for the var, for the value chosen by the

F slave, which is this value over here,

by the same amount, alpha t. Now, that changes the preference of the i

slave, but it also changes the preference of the

F slave by the exact same amounts, but in the opposite direction.

Because notice that, here we have a negative sign,

whereas here, we have a positive sign. So it makes both the i slave and the F

slave move in the same direction, move in opposite directions so as to agree with

each other. So, what happens as we continue doing

these iterations over time? Well, one can show that under fairly weak

conditions on these on these parameters alpha t, the lambdas are guaranteed to

converge. What are these conditions?

The alphas need to be big enough so that they make a difference.

So that if we sum up all of the alpha t's for all t we end up with diverging

sequence that sums to infinity. On the other hand, they need to be small

enough so that we eventually achieve convergence.

And that is the thing by the second condition on the, on the summation over

here. Now, in fact, one can also show that not

only do we get convergence, we get convergence to unique global optimum

regardless of the initialization. So, the that lambdas always converge.

Now, the question is what are the implications of that, of the convergence

of the lambdas? And, so what happens at convergence?

At convergence, you can show that there's no more change.

And so at that point, each of our slaves has its own locally optimal solution over

its own variables, the ones that are in its scope.

[SOUND] So, is that good enough? Unfortunately, not always. That is, even

though we have achieved convergence, the solutions of different slaves may or may

not agree on the variables that are shared between them.

However, if they do agree, if we can get all of the slaves to agree with each

other, we can, we have, we can show that the shared solution is a guaranteed map

assignment. So, if the slaves agree, we have

effectively our initial optimization problem, which is that of finding a map

assignment. If that doesn't happen,

that is if the slaves have not agreed on their shared variables, then we need to

solve what is called a decoding problem. Where where we need to somehow put

together these different solutions that the different slaves have converged to

and come up with a single joined assignment.

So, how do we do that? How do we solve this so-called decoding

problem? It turns out that people have constructed

a range of different solutions for this. None of which have particularly strong

performance guarantees, but they often work quite well in

practice. So, for example,

if we use the decomposition of the of the original problem into spanning trees,

then each of the standing trees covers all of the variables.

So we can take the map solution of any individual tree and say that, that is a,

an assignment to all of the variables. So, that's one approach.

A second approach that is not a one, a winner takes all, is to have each of our

slaves vote on the xi's that are in its scope,

and then for each xi, you just do a majority voting.

So, that's another approach. A different approach uses some kind of

weighted averaging of the sequence of messages that are sent regarding each xi

through the course of the iterations of the algorithm.

So each of these works reasonably well, but, but it turns out that the right

solution is to simply use multiple solutions and then pick the best one.

And, the critical observation here is that the scoring function theta for any

assignment is something that we can easily evaluate.

So, why not do all of these and use multiple trees,

and generate a whole collection of x's, and then evaluate theta x for each of

those solutions, and then pick the best one?

And it turns out that when you do that, you generally get solutions that are

better in terms of their overall score than any fixed strategy.

Now, the other nice aspect of this analysis comes back to a point that we

made earlier in this in this module. Which is that for any lambda, this

function that we defined as L of lambda is an upper bound on the value of the

true map assignment. Now, what does that imply?

It means that if you give me any x that you think is a reasonable candidate,

so this is a candidate for the map assignment.