0:00

In this video and the next, we're going to revisit the famous traveling salesman

problem. When we first talked about TSP, it was

bad news. The context was NP completeness.

We discussed Edmond's Conjecture. That, it's widely believed there's no

known polynomial time algorithm for solving the TSP problem.

But let's talk about some good news. The fact that you can do better than

naive brute-force search. In fact, it's going to be another neat

application of the dynamic programming algorithm design.

Paradigm. So let me remind you briefly about the

traveling salesman problem. The input, very simple, just a complete

un-directed graph, and each of the end choose two edges has a non-negative cost.

The responsibility of the algorithm is to figure out the minimum cost way of

visiting each vertex exactly once, that is you're supposed to output a tour, a

permutation on the vertices. That minimizes the sum of the

corresponding end edges. For example in this four vertex pink

network the minimum cost towards overall cost thirteen.

You could of course solve the problem using root four search, the running time

of root four search would be in factorial.

Tutorial. This would allow you to solve problems

with say, 12, 13, maybe 14 vertices. In this video, and the next, we'll

develop a dynamic programming algorithm that solves the TSP problem.

Now, of course, TSP is NP complete. We have to expect that.

We're not expected a polynomial time algorithm.

But, this dynamic programming algorithm will run quite a bit faster than

brute-force search. The running time will be big O of n^2

times 2^n. 2^n is of course exponential but it's

quite a bit better than n factorial. n factorial is more ball park n^n.

Okay, if you look at [INAUDIBLE] approximation you'll see it's really n

over a constant raised to the n but still that's much much bigger than a constant 2

raised to the n. To make it more concrete, you could run

this dynamic programming algorithm for values of n, probably pushing 30 or so.

So I realize these are still pretty absurdly small problem sizes, compared to

the size of the arrrays, that we can sort.

Compared to the size of the graphs on which we compute strongly connected

componenets, or shortest paths, but, that's how it goes with NP-complete

problems. You have to work pretty hard, even to

solve modest sized Problems. So at least this dynamic programming out

rhyhm proves the point. That even for empty complete problems,

there are opportunities to improve over brute-force search and the programmer

tool box I've already equipped you with is efficient to make some of those

improvements. Increments.

For the traveling salesman problem, in particular, it's been a benchmark problem

for the development of tools and optimization, and people have worked very

hard, to solve very large instances of the traveling salesman problem.

Even with, 10's of thousands of cities, even sometimes over 100,000 cities.

But, I mean, this often represents years of work.

As I said earlier there's some really nice Some popular accounts of the

Traveling Salesman Problem books out there.

Check it out if you want to know more about the state of the art for solving

large TSP instances. So say we're going to pursue a dynamic

programming approach to the TSP, so that means we should be looking for optimal

substructure. A way in which an optimal must

necessarily be composed of an optimal solution to a smaller sub-problem

extended in some easy way to the original problem.

So what could that look like for TSP? So of all of the programs that we've tackled

using dynamic programming I think the 1 which seems the most like TSP is single

source shortest paths. So think of a tour, not as a cycle

exactly but as a path from some vertex, let's say vertex number 1 looping all the

way back to itself, with, of course, the constraint that it should visit each

other vertex exact. Once on root.

We want to minimize the overall cost of this path from 1 back to itself, so that

sounds a lot like wanting to minimize the length of a path from some source to some

destination, and I'm sure you'll recall that our dynamic programming algorithm

for the single source shortest path problem was the Bellman Ford.

So what then did the sub-problems look like in the Bellman-Ford algorithm.

Well, the cool idea there was to measure sub-problem size using an edge budget, so

a cannotical sub-problem with Bellman-Ford was.

To compute the length of a shortest path from the given source vertex to some

destination vertex v. That has I edges or less.

So, by analogy, we could think here about subproblems, where we want the shortest

path from the starting node vertex, number 1, to some other vertex j.

That uses, at most. I edges.

So precisely, let's define sub-problems as follows.

Given choice I, this represents the number of edges that you're allowed to

use. Given destination J, let's assume that

vertices are They're just named from 1 to n, so I'll call it generic destination j,

an integer from 1 to n. Lets denote capital L sub i j, as the

shortest length of a path from the starting vertex 1, to this destination j,

using at most i edges. So suppose we try to set up a dynamic

programming algorithm using these sub-problems.

What do you think? Can we use these to get polynomial time algorithm for the TSP

problem? All right so the correct answer is C.

So this proposed collections sub problem does not yield the polynomial time

algorithm that is D is not the correct answer that will be surprising, right TSP

is an NP complete problem. So, if this give a polynomial time

algorithm we can all report directly to the Clay Institute for a million dollar

cash prize. Now there's a lot of students in this class, but at least we'd each get

20 bucks or so out of the deal. So that's not the case, this is not going

to be problem time out with them. What's the problem, well the problem is

also not A, it's not that there are too many subproblems.

We only have o of n choice for i, and o of n choices for j.

So there's only a quadratic number of sub-problems.

Just like in Bellman-Ford, that's totally reasonable.

We also can correctly compute the, value of large sub-problems from smaller ones,

it's really exactly the same as what we were doing in Bellman-Ford.

The problem is that our semantics are incorrect.

By solving these sub-problems, we don't actually get an optimal solution to the

traveling salesman incident that we started with.

So what are we hoping is going to be the case? Well, in our dynamic programming

algorithms thus far, after we solved all of the sub-problems, the answer to the

original question was just lying there waiting for us in the biggest

sub-problem. So here, the biggest sub-problem would

correspond to i equaling n, where you allowed to use up to n edges in your

path. The issue with these subproblems is they

only specify an upper bound I. And the number of edges you're allowed to

use. The dude does not enforce that you have

to use your full budget of I edges. So that means when we look at these

biggest sub problems. Yeah the shortest paths could use us as

much as N edges if they wanted. But in general, they won't.

They'll use much fewer than N edges. They'll skip tons of the vertices,

and that's not a traveling salesman tour. A traveling salesman tour has to

[INAUDIBLE]. Every vertex wants that is not enforced

by these sub-problems. But that problem doesn't seem so hard to

fix. Let's just insist in each sub-problem and

instead of using most I-edges and the shortest path.

It has to use exactly I edges and the shortest Path.

So in this set of sub problems there's not quite as misguided as the previous

one, the answer remains the same. The answer is still C.

Of course, we still don't get polynomial time algorithm.

We're not claiming to prove P = NP. The number of sub-problems hasn't

changed. It's still quadratic, and if you think

about it, you can still efficiently solve, for small, bigger sub-problems,

with a bigger edge budget, given the solutions to all of the smaller

sub-problems. So what's the issue? The issue is that

our semantics are still incorrect. Just because you solve all of these

sub-problems doesn't mean you can extract from it the cost of a minimum cost

Traveling Sales. Has been told.

So the problem briefly is that we don't enforce the constrain that you can't

visit a vertex more than once. What would be hoping would be true, we're

hoping and then we look at the biggest sub problem.

So this is when I is equal to n and I'm looking at path that have exactly n edges

and when we take j the destination to be. One, the same as the origin.

We were hoping that that shortest path with images from one back to itself would

be a tour and therefore the minimum cost travelling salesman tour.

But that need not be the case. Just because this path has n edges and it

starts at 1 and it ends at 1 doesn't mean it's a tour.

It might for example visit vertices 7 and 23 twice, and as a result it never visits

vertices 14 or 29 At all. For this reason, the number computed by

the largest sub problem, when i = n and when j = 1.

That can be much, much smaller than the true answer for the minimum cost

traveling salesman tour. A good algorithm designer is nothing if

not tenacious. So, let's just recognize the flaw in this

proposal, that we're not enforcing. Fact that you can't have repeat visits to

a vertex, and let's just change the subproblems again to explicitly disallow

repeated visits to a vertex. So precisely let's index the sub-problems

in exactly the same way as before. There's going to be one for each budget i

and once for each destination j. For a given i and a given j, the value of

the sub-problem is now defined, as the length, of a shortest path, beginning at

one, ending at j. A, using exactly i edges, with the

additional constraint, that no repeated vertices are allow.

The one exception being that if j = 1, then of course you're allowed to have 1

at the beginning and 1 at the end. But for the internal vertices, and the

shortest path, we do not allow, repeats. So what do you think? Does think refine

collection of sub-problems allow us to get a polynomial time algorithm for the

travelling salesman problem. So in today's quiz is more suttle than

the past couple and the correct answer is switched from C to B.

Its still the case that there is a quadratic number of sub-problems, its

still the case that we don't expect upon on our time algorithm but now that with

this different definition of sub problems they do capture the TSP.

Specifically, look at the biggest subproblem.

Take I equal to N, take J equal to 1. The responsibility of that subproblem is

to compute the shortest path from 1 to 1 that has exactly n edges, and no vertices

repeated internally. That is exactly the original problem that

we started with. That is exactly TSP.

The issue is that you cannot efficiently solve larger sub problems, problems with

a larger edge budget, given the solutions to the smaller sub-problems.

The smaller sub-problems are just not very useful for solving the bigger

sub-problems. And the reason is a little bit subtle.

Now the hope would be that like in all our previous dynamic programming

algorithms, you can just formulate a recurrence which tells you how to fill

in, how to solve your sub-problems given the solutions to smaller ones.

And there's even a natural guess, for these sub-problems.

What you hope the recurrence. Might be.

Recurrences generally follow from a thought experiment about what optimal

solutions have to look like. So you want to focus on a particular

subproblem, like given destination j and a given budget i on the number of edges,

you'd say, well let's think about an optimal solution.

So what is that? That's a path that starts at one.

It ends at j. It has exactly i edges.

It has no repeated vertices. And amongst all such paths, it has the

minimum length. And it's natural to proceed by analogy

with Bellman - Ford, and say, well, wouldn't it be cool if a little birdie

tells me what the penultimate vertex k was on the shortest path from 1 to j.

So if I knew that the second to last vertex on the shortest path was k.

Well then, surely, the length of this path would be the length of an optimal

path from 1 to k. Using exactly I-1 edges, and no repeated

vertices. With this final hop, kj concatenated at

the end. Now, of course, you don't know what the

second to last vertex k is. But no big deal.

As usual, in dynamic programming, we're just going to try all of the possibility.

So we can encode this root for search as a minimum over the sensible choices for

k. Clearly, k should not be equal to j.

It should be some other vertex that precedes j, and ignoring some base cases,

let's also preclude k from being 1. That's the starting vertex.

And of course, pictorially, what we have in mind, is we have in mind taking some

shortest path. From one to k so we'd look that up that

would be previously computed in some small sub problem and then can candidate

into it a final half that goes from k to j.

Well hey that sounds pretty good. Right, this is that kind of sound that

may be this is the key ingredient to polynomial time algorithm for TSP.

[INAUDIBLE]. So what's the problem? Well, the problem

is that we've defined our sub problems to disallow repeated visits to a vertex.

So when we compute a sub problem, we better make sure we're respecting that

constraint. That no repeated visits are allowed.

And if we have in our mind, concatenating a shortest path from 1 to K with no

repeats with this final hop. k to j.

Well this might result in a repeated visit to j.

In particular, for all we know, the shortest path from 1 to k, that uses

exactly i-1 edges, and has no repeated vertices, that path might well go through

j. In which case, concatenating this final

hop kj, results in the cycle of second visit to j.