The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

来自 斯坦福大学 的课程

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

403 评分

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

从本节课中

Week 4

Advanced dynamic programming: the knapsack problem, sequence alignment, and optimal binary search trees.

- Tim RoughgardenProfessor

Computer Science

So now that we have our magic formula, our recurrence, all that's left to do is

to systematically solve the subproblems. As usual, it is crucial that we solve the

subproblems in the right order, from smallest to largest.

How should we measure the size of a subproblem in the optimal binary search

tree problem? The natural way to do it is the number of items in the subproblem.

So if you're starting at I and you're going till J, the number of items in that

subproblem is j - i + 1 that, and that's going to be our measure of subproblem

size. So let's bust out our [INAUDIBLE] array.

The number of dimensions of this array is going to be two and that's because we

have two different degrees of freedom for annexing subproblems,

one for the start of the contiguous interval, one for the end.

So the outer for loop is going to control the subproblem size.

It's going to ensure that we solve all smaller subproblems before proceeding to

larger subproblems. Specifically, we'll be using an index s S

and in the iteration of this outer for loop, whatever the current value of S is,

we're only going to consider subproblems of size s + 1.

So you should think of s as representing the difference between the larger index j

and the earlier index i. The inner for loop controls the first

item in the contiguous interval that we're looking at,

so that's just i. And now, all we have to do is rewrite the

reccurrence in terms of the array entries and with this change variable where S

corresponds to j - i. That is for a given subproblem, starting

with the item i and ending with the item i + s.

We just by by brute force pick the best route, so the route here is going to be

similar to i and i + s. Regardless of the choice of the route, we

pick up the constant, the sum of the Pks where here, K, is ranging from the first

item i to the last item i + s. And then we also look at the previously

computed optimal solution values for the two relevant subproblems,

one starting in i, ending in r - 1. The other starting in r + 1 and ending at

i + s. So a couple of quick comments about the

two array lookups on the right-hand side of this formula.

so first of all, if you choose i to be, the root to be the first item i, then the

first lookup doesn't make sense. If you choose it to be the last item, the

second array lookup doesn't make sense. In that case, it's understood we're just

going to interpret these lookups as zero. Of course, in an actual implementation,

you'd have to include that code but, I'll let you take care of that on your own.

So the second comment is just our usual sanity check and again, you should always

do this when you write out a dynamic programming algorithm.

When you write down your formula to populate the array entries,

make sure that on the right-hand side, whenever you do an array lookup,

that is indeed already computed and available for constant time lookup.

So in this case, whatever our choice of the root is, the two relevant subproblems

are going to involve strictly fewer items than what we started with.

And therefore, the two subproblem lookups on the right-hand side will indeed have

been computed in some previous iteration of the outer for loop.

Remember, the outer for loop is ensuring we solve some problems from smallest

number of items up to largest number of items.

And of course, after the two for loops complete, what we really care about is

the answer in A of one comma n, that is the optimal binary search tree

value for all of the items that's the eventual output.

Some students like to think about these double for loops pictorially. So let's

imagine A, the 2-D array is laid out as a grid.

So imagine the x-axis correspond to the index i,

that is the first item in the set of items we're looking at and the y-axis

corresponding to j, the last item in the current set. And let me single out the

diagonal of this grid, so these are subproblems corresponding to

i = j, that is subproblems with a single

element. Now, we only ever solve problems where j

is at least as large as i, so that means we're really only filling in the upper

left or northwestern part of this table. So we never bother to fill in the

southeastern, the bottom right part of this table.

We just sort of think of it all as zero. Now, in the first outer iteration.

So, when s0. = 0, that's when our dynamic programming

algorithm solves, in turn, each of the n single item problems.

So, in the first iteration of this double for loop, it's going to solve the

subproblem A11. In the next iteration of the inner for

loop, it's going to proceed the A22, then A33, and so on.

In each of those, both of the array lookups are going to just correspond to

zero and we're just going to fill in this diagonal with the base cases,

where Aii is just the probability of item i.

Then, as the dynamic programming algorithm proceeds, we're going to be

filling in the upper left portion of this table diagonal by diagonal.

Each time we increment s, the index in the outer for loop, we're going to march

up to the next northwesternmost diagonal, and then as we step through the possible

values of i, we're going to fill in that diagonal one at a time moving from

southwest to northeast. When we're filling in the value of a

subproblem on one of these diagonals, all we need to do is lookup the value for two

subproblems on lower diagonals. Lower diagonals correspond to subproblems with

strictly fewer items. So that's it.

That's the dynamic programming algorithm that computes the value of an optimal

binary search tree given a set of items with probabilities.

I'm not going to say anything about correctness.

It's the, the same story as we've seen in the past.

All the heavylifting is improving the optimal substructure lemma,

that gave us the correctness of our occurrence given that our magic formula

is correct and we're just applying it systematically, correctness of the

dynamic programming algorithm follows in a straightforward way, just by induction.

Let me, however, make some comments about the running time.

So, let's just follow the usual procedure.

Let's just look at how many subproblems got to get solved and then how much work

has to get done to solve each of those subproblems.

So as far as the number of subproblems, it's all possible choices of i and j,

where i is at most j, or in other words, it's essentially half of that n by n

grid. So this is roughly n squared over two,

let's just call it theta of n squared, so a quadratic number of subproblems.

Now, for each of the subproblems, we have to evaluate this recurrence, we have to

evaulate the formula, which conceptually is a breadth-first search through the

number of candidates that we've identified. And a disctinction between

this dynamic programming algorithm and all of the other ones we've seen

recently, sequence allignment, knapsack, computing independent set of the line

graphs, is that it's actually kind of a lot of options for what the optimal

solution can be. That is, our breadth-first search, for the first time,

is not over a merely constant number of possibilities.

We have to try every possible route, each of the items in our given subproblem

is a candidate route and we try them all. So, given a start item of i and an end

item of j, there's j minus i plus one total items and we have to do constant

work for each one of those choices. So there will be subproblems, some

subproblems that we can evaluate quickly and only say constant time if i and j are

very close to each other, but for a constant fraction of the

subproblems we have to deal with, this is going to be linear time,

theta of n time. So over all, that gives us a cubic

running time, theta of n cubed. Alright, so I would say this running time

is sort of okay, not great. So it is polynomial time, that's good.

That's certainly way, way, way faster than enumerating all of the exponentially

many possible binary search trees, so it blows away breath-first search.

But it's not something I would call blazingly fast or for free primitive or

anything like that. So you're going to be able be to solve

problem sizes with n in the 100's, but probably not n in the 1000's.

So that will cover some applications where you'd want to use this optimal

binary search tree algorithm, but not all of them. So it's good for some things,

but it's not a universal solution. On the other hand, here's a fun fact.

And the fun fact is you can actually speed up this dynamic programming

algorithm significantly. You can keep with the exact same 2-D

array with the exact same semantics. Again, each index is going to correspond

to the subproblem with the optimal binary search tree between items i and j

inclusive. But, you can actually fill up this entire

table all n squared entries using only a total of n squared time.

That is on average, constant work per subproblem.

So this fun fact, it's very clever. It's really more intricate than what we

discussed in this video here, but it, it's not impossible to read.

So if you're interested, I encourage you to go back to the original papers or

search the web for some other resources on this optimized speed up version of

this dynamic programming algorithm. I mean, at a very high level,

sort of from 30,000 feet, the goal is to avoid doing this

breadth-first search over all possible routes in every single subproblem.

And it turns out there's structure, nice structure in this optimal binary search

tree problem that allows you to piggyback on the work done in smaller subproblems.

So, in smaller subproblems, you already searched over a bunch candidate roots and

it turns out using the results of those previous breadth-first search is, you can

make inferences about which subset of the current set of roots might conceivably be

the ones that determine the recurrence. And so, that lets you avoid searching

over all of the possible candidates for the roots,

instead focusing just on a very small set.

In fact, the average, on average, constant number of possible roots over

all of the subproblems. And needless to say, this speeding up of

the running time from cubic to quadratic really significantly increases the

problem sizes that you can now apply this algorithm to.

So now, instead of being stuck in the hundreds, you'd certainly be able to

solve problem sizes in the 1000's, possibly even in the 10,000's using this

quadratic time algorithm. Very cool.