The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

来自 斯坦福大学 的课程

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

405 评分

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

从本节课中

Week 3

Huffman codes; introduction to dynamic programming.

- Tim RoughgardenProfessor

Computer Science

So the time has arrived to begin our study of dynamic programming.

So this is a general algorithm design paradigm.

As I mentioned at the beginning of the course, it has a number of justly famous

applications. However, I'm not going to tell you just

yet what it is that makes an algorithm dynamic programming. Rather, our plan is,

over the next few videos, to develop from scratch an algorithm for a non trivial,

concrete computational problem. The problem is finding the maximum weight

independent set and a path graph. This concrete problem is going to force

us to develop a number of new ideas. And, once we've finished solving the

problem, at that point, we'll zoom out, and I'll point out what are the

characteris-, characteristics of our solution which make it a dynamic

programming algorithm. Then, armed with both a sort of formula

for developing the dynamic programming algorithms, as well as a concrete

instantiation, we'll move onto a number of further, and in general harder

applications of the paradigm. Indeed, even more than usual, the dynamic

programming paradigm takes practice to perfect.

In my experience, students find it counterintuitive at first, and they often

struggle to apply the paradigm to problems that they haven's seen before.

But here's the good news, dynamic programming is relatively formulaic,

certainly more so than our recent study of greedy algorithms, and it's something

that you can get a hang of. So with sufficient practice, and that's

exactly what I'll be giving you in the next couple of weeks, you should find

yourself with a powerful and quite widely applicable new tool in your programmer

tool box. So let me introduce you to the concave,

concrete problem we're going to study over the next few videos.

It's a graph problem but a very simple graph problem.

In fact, we're going to restrict our attention merely to path graphs.

That's graphs that consist solely of a path on some number n of vertices.

The only other part of the input is a single non-negative number per vertex.

We're going to call these weights. For example here's a path graph on four

vertices, and let's give the vertices the weights one, four, five, and four.

The responsibility of the algorithm is going to be to output an independent set.

What that means is a subset of the graph's vertices, so that no two vertices

are adjacent. So in the context of a simple path graph,

it just means you gotta return some of the vertices and always avoiding

consecutive pairs of vertices. So when you have a path of four vertices,

examples of independent sets include the empty set,

any single vertex, vertices one and three,

vertices two and four, and vertices one and four.

You could not, for example, return vertices two and three.

Because those were adjacent. That is forbidden.

Now, to make this interesting, we're going to want just, not any old

independent set, but the one whose sum of vertex weights

is as large as possible. That's the max weight independence set

problem. So what I'm going to do next is use this

concrete problem as an opportunity to review the various algorithm design

paradigms that we've seen so far. Along the way we'll see that none of them

actually work very well for this problem. And that's going to motivate us to devise

a new approach, and that approach is going to turn out to

be dynamic programming. .

So there's always our standard punching bag, brute force search.

This would entail iterating through all of the independent sets and remembering

the one with maximum total weights. Of course it's correct, no question about

that, but as usual this would require exponential time.

Even in just a path graph, the number of independent sets is exponential in the

number of vertices, n. .

So what other algorithm design paradigms do we know?

Well we just finished a big segment on greedy algorithms, we could certainly

think about that. You know, pretty much most problems it's

easy to propose greedy algorithms, and this one's no exception .

I think the most natural greedy algorithm you might try to use to compute a

maximally independent set would be well. What's the myopic decision?

Well, you want to get as much weight overall.

So, in each step, you want to just pick the vertex with the highest weight that

you haven't already chosen. Now, of course you have to worry about

feasibility. Remember, we're not allowed to output

adjacent or consecutive vertices. So, if any vertex is ruled out by

adjacency, we ignore it. And amongst those that preserve

feasibility, we include the highest weight one in our set so far.

Well, let me redraw the four node path graph we had in the last slide and let me

ask you. What would this greedy algorithm compute

on the four node path, and how does that compare to the optimal solution, the

independent set with the maximum total weight?

So the correct answer is the second one. Let's see why.

So let's start with the optimal solution, the maximum independence set.

Remember independence sets are forbidden from choosing adjacent or consecutive

vertices. so in this case the only sensible

solutions to consider are the first and third vertex, the second and fourth or

the first and fourth. Of these, the best is the second and

fourth for a total of eight. So what about the greedy algorithm?

Well, you know, we just had this period of time where we got really spoiled with

the success of greedy algorithms. especially the minimum span entry

problem. But let me remind you, you know, greedy

algorithms, you know, they're often good heuristics.

They're often not guaranteed to be correct.

And so I'm happy to have this opportunity to quickly remind you again, of that

drawback of greedy algorithms. They're quite frequently not correct.

So this is another such case, so what will the greedy algorithm do?

Well, it begins by picking the max weight vertex over all.

So that would be this vertex with weight five.

That unfortunately blocks the algorithm from picking either of the two vertices

that has weight four. The only remaining option that preserves

feasibility is to pick. The vertex of weight one.

So that gives us an indepenedent set of weight six.

So this greedy algorithm is not correct. You could of course try to devise other

types of algorithms, but I don't know of any greedy approach that will actually

solve this problem optimally. So that's a bummer, but we still haven't

exhausted our algorithm design paradigms. remember, you know, we learned this quite

powerful divide and conquer approach early on in part one of this class.

And you know, it seems like it could work here.

We had all these successful applications where the input was an array.

We broke it into two halves. We were cursed on both sides and combined

the results. And here, you know, path graphs don't

look so different than an array of numbers.

So the obvious approach for divide and conquer is to break the path into two

paths, each of half the length of the original,

recursively compute a maximum weighted independent set of each, and then somehow

combine the results. With the issues with the divide and

conquer approach are already apparent, just in our simple four vertex example.

So if we recurse on the left half, that is the first two vertices, and we compute

a max weight independence set, that's just going to be the second vertex by

itself. And if we independently recurse on the

right hand side, on the vertices three and four, the maximum weight independence

set on right half is going to be the vertex of weight five.

And now when we, the recursion's complete and we get our sub-solutions back, we

have the second vertex and we have third vertex,

but the problem is the union of those two solutions conflicts.

Right? We cannot simultaneously output the

second and third vertices. Those are consecutive,

those are adjacent, and that's not allowed.

Moreover, you know, in a four note graph. It's sort of easy to see how to repair

this conflict. But in a big graph with say thousands of

nodes, if you have a conflict right where the two sub-problems meet, it is not at

all obvious how you would quickly fix that and get a feasible and optimal

solution to the original problem. Now in some sense the divide and conquer

paradigm is more powerful or better suited for this problem than the greedy

approach, in that I do know of divide and conquer algorithms that could solve this

problem optimally that run in quadratic time.

But doing better than that in a divide and conquer matter seems quite

challenging. And the dynamic programming based

algorithm we'll develop will solve the problem in linear time.

That's coming up next.