The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

来自 Stanford University 的课程

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

713 个评分

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

从本节课中

Week 3

Huffman codes; introduction to dynamic programming.

- Tim RoughgardenProfessor

Computer Science

So we now have an algorithm, a very elegant one, that solves the weighted

independent set problem in path graphs. Moreover the algorithm runs in linear O

of n time, clearly the best possible. But before we do a victory lap I want to

point out that there's something this algorithm doesn't do that you might want

it to do, mainly hand you the actual optimal solution, not merely the value of

that optimal solution. So the point of this video is to show you

how we can reconstruct an optimal solution given the table-filling

algorithm from the previous video. So we just write that algorithm back

down. It's so short, it won't take me too long.

Now when this algorithm completes, what do we have in our hands?

We have an array, and this array is filled with numbers.

In particular, in the last entry in the array is a number like 184 so that's

great. That tells us that the maximum weight

that any independence set possesses is 184.

But, in many applications, we're going to want to know not just that information

but actually which vertices constitute that independence set with total weight

of 184. So, perhaps the first hack that comes to

mind to address this issue is to augment the array, so that each entry stores not

just the value of an optimal solution to the sub-problem produced by the graph G

sub I, but also an actual set of vertices achieving that value.

And I will leave it for you to if you want, rework the previous pseudo codes so

that when you fill in a new entry, you fill in not just the value of an optimal

solution given solution to the previous sub-problems but infact also the solution

itself. This hack however is not generally how

things are done in dynamic programming. It unnecessarily wastes both time and

space and a much smarter approach is to reconstruct from the filled in table an

optimal solution as needed. So if you think about it, it's kind of

cool that this is even possible, that our one line algorithm doesn't cover its

tracks, that it leaves enough clues for us as detectives to go back and examine

and reconstruct what the optimal solution is.

The following key point articulates exactly why this is indeed possible.

So the starting point of this observation is the correctness of our algorithm, our

one line algorithm. Which of course we established in the

previous video. And by correctness I mean what's

guaranteed that this algorithm will populate each entry of your array

correctly. The number in the ith entry is indeed the

maximum weight of independent set in the graph G I.

So remember our thought experiment about what the optimal solution could possibly

look like. We concluded it could only be one of two

things, and we really wound up wishing we had a little birdie that could tell us

whether or not the rightmost vertex v.sub.n was in the optimal solution or

not. If we knew which case we were in we could

just recursively compute the remainder of the solution from a graph that has either

one or two fewer vertices. So here's the point,

this filled in table, that's our little birdy.

Here's what I mean. But what, what's, what's the reasoning

that our algorithm goes through to fill in this last entry of this array, and

don't forget, we've already proven that our algorithm's correct, we did that in

the last video? Well, it does a comparison between the

two possible candidates vying for the optimal one.

On the one hand it commutes the case one solution that looks up the optimal value

of a solution for the graph with one fewer vertex and it compares that to the

case two solution including V N the last vertex and adding that to an optimal

solution with two fewer vertices. And in this max operator in the line of

code it's explicitly comparing which is better, case one.

The solution which excludes B sub N, or case two, the solution which includes B

sub N. So, whichever one of those was the

winner, whichever one of those cases was used to fill in that last entry.

That exactly tells us whether or not B sub N is in the optimal solution.

If we use the first case, that means B sub N is not in the optimal solution, it

gets excluded. If the second case was the winner, then

we know B sub N is in the optimal solution, because that was the winner.

If we have a tie, then there's an optimal solution either way.

There's one that includes BN and there's one that excludes BN,

So those are the tracks in the mud left for us by the forward direction of the

for-loop. We can just go back and look at which

case was used to fill in each entry of the array.

Again, for the ones that used case one, that corresponds to excluding the current

vertex; for those that used case two to fill in the entry, that corresponds to

including that vertex, in the solution. So the reconnect structure algorithm will

take as input the filled in array that was generated by our one line algorithm

on the previous slide. And what it's going to do, it's going to

trace through this array from right to left.

And at each step of the main loop, it's going to say, it's going to look at the

current entry. And it's just going to compute explicitly

which of the two candidates were used to fill in this array entry.

If you want, you can also cache the results of these comparisions on the

forward pass. That's an optimization that will be

useful later for harder problems. But for now if you want, you can just

think about redoing the comparision thing.

Hey, you know, their were two possible. More ways that we could have filled in

this entry, let's just check which of the two were used.

So if in fact the preceding array entry is at least as large as the one from two

back plus the weight of the current vertex, that corresponds to case one

winning, to the sub-solution that excludes the current vertex being better

than the one that includes it. So in that case we just skip the current

vertex V sub I and we decrease the array index by one in our scan.

If the other case wins, that is, if we fill in the current entry, if an optimal

solution to the current graph G sub I comprises the current vertex of E sub I

plus the optimal solution to the graph with two fewer vertices.

In this case we know we'd better include V sub I.

That's part of an optimal solution to the current sub-problem.

Moreover, that's the case where we need to look up the optimal solution with two

fewer vertices. So, we include the current vertex and we

decrease the array and index by two. So formally we have a correctness claim

which is that the final output S return by the reconstructing algorithm is, as

desired, a maximum weight independent set of the original graph g.

We've already talked about all the ingredients necessary for a formal proof,

for those of you who are interested. Of course, it precedes by induction and,

in the inductive step, you use the same case analysis we've been using over and

over again. The optimal solution at any given point,

it has only two possible candidates. The algorithm explicitly figures out

which of the two it is and that is what. Triggers whether or not to include or

exclude the current vertex. The running time, it's even easier.

We have a wild loop that runs in most an iterations.

We do constant work in each of the iterations.

So just like the forward pass, this backwards pass is really just a single

scan through the away. It's going to be lightning fast linear

time.