I'll prove this performance guarantee for you on the next slide.

But let me first make a couple of comments. So first of all, the analysis

of this algorithm stated in the theorem is tight.

The 50% cannot be improved without making additional assumptions about the input.

Indeed, here's an example, which is itself, is even I bipartite graph.

So it's a tractable special case of maximum cut.

But even on bitartike graphs, the local searh heuristic might return to you a cut

which has merely 50% of the edges of a globally optimal, maximum cut.

The example is drawn here in green. There's just 4 verticles and 4 edges.

As I said it is a bi-parted graph, so the best part is to put U and W in one group

and V and X in one group. That cuts all 4 of the edges.

On the other hand, one possible output of the local search algorithm is the cut

where U and V are in 1 of the groups. A, and W and X are the other group B.

So this cut has only 2 crossing edges, only 50% of the maximum possible, yet it

is locally optimal. If you take any of the 4 vertices and you

switch it to the other group, you get a cut with 3 vertices on one side and then

the vertex by itself, since every vertex in this graph has degree 2, all of those

cuts are also only going to have 2 crossing edges.

The second cautionary tale that I need to tell you is that you maybe shouldn't be

super impressed with the perofmance guarnatee of 50% for the maximum cut

problem. Indeed, even if I just took a uniform at

random cuts, by which I mean for each of the invertices I flip Independently a

fair coin. If the coin comes up heads, I put that

vertex in A. If the coin comes up tails, I put that

vertex in B. The expected number of edges that cross a

random cut chosen in this way, is already 50% of the edges of the graph.

Just for kicks, let me sketch a quick proof of this fact.

Some of you may remember from part 1, I introduced a decomposition principle for

analyzing the expected value of complicated random variables.

You have a complicated random variable that you care about, what do you do, you

express it as the sum of indicator random variables.

Random variables that only take on the values of 0 and 1.

When you apply linearity of expectation that reduces computing the complicated

expectation, to a sum of simple expectations.

So it turns out that decomposition principal works beautifully to prove this

fact The complicated random variable that we care about is the number of edges that

cross a random cut. The constituents that are indicator 0 1

random variables are just whether or not a given edge crosses a random cut.

That is for an edge E of this input graph G, I'm going to define the random

variable X of E to be 1 if it's 2N. Points wind up in different groups, and 0

if its 2 end points wind up in the same group.

So what's the expected value of one of these indicator random variables x sub e?

Well, as with any indicator random variable, that's just the probability

that x of e=1, the probability that this edge e is cut by a random cut.

What's the chance of that? Well lets say the endpoints of edge E, are u and v.

There is 4 possibilities; u and v could both be in A, u and v could both be B, u

could be in A and v could be in B, or u could be in B, and v could be in A.

Each of those 4 outcomes is equally likely, probability of 1/4.

In the first 2 cases, this edge is not cut by the random cut, the 2 endpoints

are on the same side. In the other 2 cases, it is cut, they're

on different sides. So it's a 1/2 probability that this edge

is cut in a random cut, therefore the expected value of X sub B is 1/2.

And now we're good to go just by applying linearity of expectations.

So, precisely, what do you care about? We care about the expected number of edges

across a random cut. Well the number of edges crossing a cut

is just by definition the sum of the axes over all of the edges E.

X E is just whether or not. Now the given edge crosses the cut.

So the expected value of the random variable we care about the number of

crossing edges that's by linear [INAUDIBLE] expectation.

Just the sum of the expected values of these indicated random variables the X of

e's. Each of those has value one half, we're

summing up one for each of the edges. So the total of the sum is just the

number of edges divided by 2 and as claimed.

Thanks for indulging my little digression.

Again, the point of which is that just taking a random cut digital performance

guarantee of 50% for the maximum cut problem.

But, in the defense of local search, which also only gets a 50% performance

guarantee, it took a while and in fact you have to work pretty hard to do, to

get a performance guarantee better than 50% with a polynomial time algorithm, for

the max cut problem. The most famous such algorithm is by

Gilmers and Williamson. That took until 1994 and it requires a

tool called semi-definite programming, something that's even more powerful than

linear programming. Let's now prove that local search is

guaranteed to output a cut whose number of crossing edges is at least half.

The total number of edges ion the graph. So pick your favorite locally optimal cut

A-B. And here by locally optimal, I just mean

something that the algorithm might return That is, a cut for which it is impossible

to swap a single vertex from one side to the other and improve the value of Of the

cut. By virtue of being locally optimal it

must be the case that furthest cut AB and for every single vertex of V, the number

of edges incident to this vertex that are crossing the cut is at least as large as

the number of vertices incident to this vertex that do not cross the cut, so the

previous notation that is C sub V. Is at least as big as D sub V.

If not, swapping V would give us a better cut.

So we have N of these inequalities, one is for each vertex V, so we can

legitimately sum up those inequalities combining them into one.

Now, let's focus first on the right hand side of this summed up inequality, so the

sum over all of the vertices in the graph.

Of the number of edges incident to the vertex that are crossing the cut.

Now here's the main point, what is this sum on the right hand side counting? It's

counting each edge that crosses the cut AB exactly twice.

Consider an edge, say U,W which is crossing the cut AB.

It gets counted twice in the right hand side, once when V=U and once when V=W.

We can apply exactly the same reasoning to the left hand side.

What is this I'm counting? It's counting each non-crossing edge exactly once.

Consider an edge against a u comma x who's both endpoints are on the same

side. That's going to be counted once in this

sum when v=u and then again when v=x. Now we want to compare the number of

crossing edges of this locally optimal cut to the total number of edges.

So on the left hand side we're missing the crossing edges, so to complete that

into all of the edges let's just add double the number of crossing edges to

both sides of this inequality. On the left hand side, we get double of

all of the edges and now on the right hand side we have 4 times the number of

crossing edges. Dividing both sides of this inequality by

4, we see we've proved the theorem. The number of edges that cross A,B is

indeed a full 50% or more of the total number of edges in the graph.

It's interesting to discuss briefly how the conclusions we've reached for a local

search for a max cut extended to a natural weighted version of maximum cut.

So which facts about the unweighted special case extended the weighted case

and which facts do not? Well it still makes perfect sense to take a local

search approach to the weighted version of the maximum cut problem.

Now for each vertex in a given cut, you just look at the total weights of the

incident edges that are crossing the cut and the total weights of the incident

edges that are not crossing the cut. And whenever the weights of the edges not

crossing the cut is strictly bigger, that's an opportunity to improve the

current cut, by moving the vertex to the opposite side.

One cool thing is that the performance guarantee that we just established of 50%

for the output of local search, that carries over to the weighted case and the

proof remains essentially the same and I'll leave it for you to check that in

the privacy of your own home. Now, it's also true that in the weighted

case a random cut still gets 50% of the total weight of the graph.

So, perhaps this performance guarantee is not Nothing really to write home about.

What breaks down is the running time analysis.

Remember how this went for the unweighted case.

We argued that there can be at most n choose 2 edges crossing any cut, and

since every iteration of local search increases the number of crossing edges,

it has to halt within a quadratic number of iterations.

And that means it's easy to implement the algorithm.

In polynomial time. One way to think about what's special

about the unweighted version of the maximum cut problem is that even though,

as we've seen, the graph has an exponential number of different cuts, all

of those exponentially many cuts only take on a polynomial number of Different

objective function values. The number of crossing edges is somewhere

between 0, and n choose 2. By contrast, once the edges can just have

any old weights, now, you can have an exponential number of cuts, with an

exponential number, of distinct objective function values, of distinct values for

their total weight. That means, just because your strictly

improving the total weight crossing the cut in each iteration It does not imply

that you're going to converge in a polynomial number of iterations.

And in fact, it's a highly non-trivial exercise to prove that there are

instances of weighted maximum cut, in which local search takes an exponential

number of iterations before converging.