0:00

This video is a segway between bad news and good news.

The bad news, which we have now discussed, is NP completeness.

The fact that there are computationally intractable problems out there in the

world. In fact, they are fairly ubiquitous and

you're likely to encounter them in your own projects.

The good new is that NP completeness is hardly a death sentence.

Indeed, our algorithmic tool box is now rich enough to provide many different

strategies toward coping with NP complete problems.

So suppose you have identified a computational problem on which the

success of your new startup company rests.

May be you would spend the last several weeks throwing the kitchen sink at in.

All the algorithm design paradigms you know, all the data structures, all the

primitives, nothing works. Finally, you decide to try to prove the problem is NP

complete, and you succeed. Now you have an explanation for why your

weeks of effort have come to naught, but that doesn't change the fact that this is

the problem that governs the success of your project.

What should you do? Well, the good news is, NP completeness is certainly not a

death sentence. There are people solving, or at least

approximately solving, NP complete problems all the time.

However, knowing that your problem is NP complete does tell you where to set your

expectations. You should not expect some general

purpose, super-fast algorithm, like we have for other computational problems,

like say, sorting, or single source shortest paths.

Unless you are dealing with unusually small, or well structured inputs, you're

going to have to work pretty hard to solve this problem, and also, possibly,

make some compromises. The rest of this course is devoted to

strategies for solving or approximately solving NP- complete problems.

In the rest of the video, I'll give you an orientation for what those strategies

are, and what you can expect to come. So as usual, I'm going to to focus here

on general purpose strategies that cut across multiple application domains.

As usual, these general principles should just be a starting point.

You should take them, and run with them, augmenting them with whatever domain

expertise you have, in the specific problem that you need to solve.

The first strategy, is to focus on computationally tractable special cases,

of an np complete problem. Related, relatedly you want to think

about what's special about your domain or about the data sets that you're working

with, and try to understand if there's special structure which can be exploited

in your algorithm. Let me point out, we've already done this

in a couple of cases in this course. The first example we saw concerns the

weighted independent set. So we started this problem on path graphs

but computational problem makes perfect sense in general graphs.

The general problem is I give you as input, an undirected graph, every vertex

has a weight, and I want the maximum weight subset of vertices that is an

independent set. And remember, in an independent set, you

are forbidden to take any 2 vertices that are neighbors.

So in an independent set, none of the pairs of vertices that you've picked are

joined by an edge. In general graphs, the way to do an

independent set problem is NP complete, so we certainly don't expect it to have a

polynomial time algorithm. But, in the special case where the graph

is a path, as we saw, there's a linear time, dynamic programming algorithm, that

exactly solves the weight of the independent set problem.

3:19

So path graphs form a special case of the weight of the weighted independent set

problem that's computationally tractable, solvable in polynomial, even linear,

time. In fact, the frontier of tractability can

be pushed well beyond path graphs. On the homework, I asked you to think

through the case of graphs that are trees, and notice that you could still do

dynamic programming efficiently, to be weighted independent sets and trees.

You can even get computationally efficient algorithms for a broader class

of graphs, known as bounded tree width graphs.

So the definition of that is a little outside the scope of this course, but you

can go even beyond trees. So the second example follows from my

dynamic programming algorithm for the Knapsack problem, so we discussed that

running time and we explain why it's exponential time.

The running time of our dynamic programming Knapsack algorithm is N, the

number of items times capital W, the Knapsack capacity.

And because it only takes log W bits to specify the capacity capital W, we don't

call that a polynomial time algorithm. But, imagine you only have to solve a

knapsack instance where the capacity is not too big, maybe even say that capacity

capital W is Big O event, and you definitely will see knapsack instances in

practice, which possess that property. Well then our dynamic programming

algorithm just runs in time, O(n^2), and that's a bonafide polynomial time

algorithm for this special case of a small knapsack capacity.

So next, let me mention a couple of examples we're going to see in the

forthcoming videos. The first one is going to concern the

2-SAT Problem. The 2-SAT is a type of constraint

satisfaction problem. You remember, in a constraint

satisfaction problem, you have a bunch of variables, each one gets assigned a

value. So the simplest case is the Boolean case, where each variable can be

0 or 1, false or true, and then you have a bunch of clauses, which specify the

permitted joint values of a collection of variables.

The 2 in 2-SAT refers to the fact that each constraint involves the joint values

of only a pair of variables. So, a canonical constraint in a two set

instance, is going to for two variables, specify three joint assignments that are

allowed and one that's forbidden. So for example may be it will say offer

variables x3 and x7, it's okay to set them both to true, its okay to set them

both to false, its okay to set 3 to true and 7 to false, but it's forbidden to set

3 to false and 7 to true. So that's a canonical constraint in a

2-SAT instance. 3-SAT, it's the same thing, except the constraints involve the

joint values to a triple of variables, and it's going to forbidden one out of

the eight possibilities. Now the 3 set problems are a conanicle NP

complete problem. That was really singled out by Cook and

Levin as being sufficiently expressive to encode all problems in NP.

But, if each constraints had size only two, then, as we'll see, the problem

becomes polynomial times solvable. There's a couple of ways of proving that.

We're going to talk about a local search algorithm that checks whether or not

there is indeed an assignment to the, variables that simultaneously satisfies

all of the given constraints. So the final example, we'll discuss in

more detail later, but just very briefly, we're going to discuss the vertex cover

problem. This is a graph problem,

and the vertex cover is just a complement of an independent set.

So while an independent set cannot take two vertices from the same edge, in the

vertex cover problem, you have to take at least one vertex from, every single edge.

And then what you want is you want the vertex cover that minimizes the sum of

the vertex weights. Yet again, this is an NP complete problem

in general, but we're going to focus on the special case where the optimal

solution is small. That is, we're going to focus on graphs,

where there's a small number of vertices, such that every single edge has at least

one N point in that small set, and we see that for that special case, using a smart

kind of exhaustive search we'll actually be able to solve the problem in

polynomial time. So let me reiterate that these tractable

special cases are meant primarily to be building blocks, upon which you can draw

in a possibly more sophisticated approach to your NP complete problem.

So just to make this a little more concrete, let me just kind of dream up

one scenario to let you know what I am thinking about.

Imagine, for example, that you have a project where unfortunately it's not

really 2-SAT that you're confronting, it's actually a 3-SAT instance. So you're

feeling kind of bummed, 3-SAT is NP complete, and maybe you have 1000

variables, and certainly you can't do brute force search over the 2 to the

1,000 possible ways of assigning values to your 1000 variables.

But, maybe the good news is, because you have domain expertise, because you

understand this problem instance, you know that, yeah,

there's 1,000 variables, but there's really 20 that are crucial.

You have a feeling, that all of the action, basically, is boiling down to how

these 20 core variables get assigned. Well now, maybe you can mix together some

brute force search with some of these tractable special cases.

For example, you can ennumerate over all 2 to the 20 ways of assigning values to

this core set of 20 variables. 2 to the 20 is roughly a million, that's

not so bad. And now, what you're going to do is, for

each of these million scenarios, you check whether there's a way to extend

that tentative assignment of 20 values to the 20 variables, to the other 980

variables, so that all of the constraints get satisfied.

The original problem is solvable if and only if there exists a way of assigning

values to these 20 variables that successfully extends to the other 980.

Now, because these are the crucial variables and it's where all the action

is, maybe as soon as you assign all of them, 0's and 1's the residual SAT

instance is tractable. For example, maybe it just becomes a

simple 2-SAT instance, and then you can solve it in polynomial time.

So this gives you a hybrid appoach, approach.

Brute force search at the top level, tractable special cases for each guess of

assignment of the 20 variables, and you're off to the races.

And I hope it's clear, I mean this as just one possible way that you might

combine the various building blocks we're developing into a more elaborate approach

to tackling NP complete problem. And that's generally what they take, they

take a fairly elaborate approach, because, after all, they are NP complete,

you've gotta respect that. So with that digression complete, let me

mention what are the other two strategies we're going to be exploring in the

lectures to come. So the second strategy, which is

certainly one very common in practice, is to resort to heuristics.

That is, two algorithms, which are not guaranteed to be correct.

We haven't really seen examples of heuristics in the course thus far. those

of you that are alumni of part 1, perhaps we could classify Carger's randomized

minimum cut algorithm as a heuristic, because it did have a small failure

probability of failing to find, the minimum cut.

But rather, I'm going to focus on some examples in the upcoming lectures.

I'm going to use the. Knapsack problem as a case study,

and what we'll see, is that our toolbox, which contains various algorithm design

paradigms, it's useful not just for designing correct algorithms, but it's

also useful for designing heuristics. So in particular, we'll get a pretty good

algorithm for the Knapsack problem, using the greedy algorithm design paradigm,

and we'll get an excellent Heuristic for Knapsack, using the dynamic programming

algorithm design pardigm. The final strategy, is for situations

where you are unwilling to relax correctness.

You're unwilling to consider heuristics. Now, of course, for an NP complete

problem, if you're always going to be correct, you're not, you don't expect to

run in polynomial time, but there are still opportunities to have algorithms

that, while exponential time in the worst case, are smarter than naive brute force

search. So we have in fact already seen one

example that can be interpreted as a implementation of this strategy, that's

for the knapsack problem. So, in the knapsack problem, naive brute

force search, would just run over all possible subsets of the items.

It would check if a subset of items fit in the knapsack.

If it does, it remembers the value, and then it just returns the feasible

solution. with maximum value.

That has time proportional to 2 to the n, where n is the number of items.

Our dynamic programming algorithm, has running time n times W.

Now, of course, cap- this is no better than 2 to the n, if the knapsack capacity

is huge, if it is itself, 2 to the n. But, as we argued, if W is smaller, this

algorithm is going to be faster. And also, as you learned on the third

programming assignment, sometimes even though W is big, dynamic programming's

going to beat the crap out of brute force search.

So I'll show you a couple of other examples.

We'll talk about the travelling salesman problem, where naive brute force search

would roughtly take n factorial time, where n is the number of vertices.

We'll give an alternative dynamic programming base solution which runs in

time only 2 to the n, which is much better than n factorial.

The third example which will cover in a forthcoming video, we already alluded to

briefly on the last slide. It's for the vertex cover problem.

So this is where you're given a graph, every vertex has a weight, you want the

minimum weight subset of vertices that includes at least one endpoint from every

edge. We're going to consider the version of

the problem where you want to check whether or not it's possible to have a

vertex cover that uses only k vertices, whether or not there exists k vertices

that includes one endpoint at least, from each edge.

The naive brute force search will run in time, end the k, which gets absurd, even

when k is quite small, but we're going to show that there's a

smarter algorithm, which is still exponential in k that runs in time only 2

to the k times the size of the graph.