The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

来自 Stanford University 的课程

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

513 个评分

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

从本节课中

Week 4

Advanced dynamic programming: the knapsack problem, sequence alignment, and optimal binary search trees.

- Tim RoughgardenProfessor

Computer Science

So in these next two videos we'll give our second application of the dynamic

programming paradigm.

It's to a quite well known problem, it's called the knapsack problem.

And we'll show how following the exact same recipe that we used for

computing independent sets in path graphs

leads to the very well known dynamic programming solution to this problem.

So let's jump right into the definition of a knapsack problem.

So the input is given by n items.

So each item comes with a value,

V sub i, the more the better for us, and also a size, W sub i.

We're going to assume that both of these are non-negative, for

the sizes we're going to make additional assumption that their integral.

In addition to these two n numbers,

we are given one more which is called the capacity, capital W,

again we'll assume this is non-negative and an integer.

The role of these integrality assumptions will become clear in due time.

The knapsack problem,

the responsibility of an algorithm is to select a subset of the items.

What do we want?

We want as much value as possible, so

we want to maximize the sum of the values of the items that we select.

So what prevents us from picking everything?

Well, the sum of the sizes of the items that we pick have to total to at

most the capacity capital W.

Now I could tell you some cheesy story about a burglar breaking into a house

with a knapsack with a certain size capital W and

wants to make away with sort of the best loot possible.

But that would really be doing a disservice to this problem which is

actually quite fundamental.

This problem comes up quite a bit,

especially as a subroutine in some larger task.

Really, just whenever you have sort of a budget of a resource that you can use, and

you want to use it in the smartest way possible,

that's basically the knapsack problem.

So you can imagine how it would come up in a lot of contexts.

So let's now execute a recipe for

developing a dynamic programming algorithm.

Remember, the key to any dynamic programming solution is to figure out what

is the right set of subproblems.

And we're going to arrive at the sub problems for

the knapsack problem just as we did for max weight independent sets by doing

a thought experiment about the optimal solution, and

understanding what it has to look like in terms of solutions to smaller subproblems.

The bottom line of this thought experiment,

the deliverable will be a recurrence.

It will be a formula which tells us how the optimal value of

one subproblem depends on the value of smaller subproblems.

So to begin the thought experiment, fix an instance of knapsack and

let capital S denote an optimal solution, a max value solution.

We began our previous thought experiment with a content free statement that

the final vertex of a path is either in the optimal solution or it's not.

Now what is the analog of the right most vertex in this knapsack problem?

Well, unlike a path graph,

there's no intrinsic sequentiality to the items that we're given.

They're just an unordered set.

But it's actually useful to think of the items as sort literally ordered one,

two, three, all the way up to n, and

then the analog of the right most vertex is just the final item.

So the content free statement we're going to work with here is either the last item

belongs to the optimal solution capital S, or it doesn't.

We'll again start with the easy case when it doesn't.

What we argued in the path graph problem was that the max weight independent set in

the analogous case one has to be optimal if we just delete that rightmost edge from

the graph.

So here, the analogous claim is that this set S should still be optimal if we delete

the final item n from the knapsack instance.

The argument is the exact same near trivial contradiction.

If there was a different solution, s star,

amongst the first n minus 1 items with a weight even bigger than that of s,

we could regard this equally well as a superior knapsack feasible solution back

with all n of the items, but that contradicts the purported optimality of s.

So let's go through the slightly trickier case two together using a quiz.

So suppose the optimal knapsack solution does in

fact make use of this final item n.

Now we want to talk about this being somehow composed of an optimal

solution to a smaller subproblem.

So if we're going to delete the last item, then we can't talk about s,

because s has the last item, so

we need to remove the last item from s before we talk about its optimality.

That's analogous to back in the independent set problem,

we removed the right most vertex from the optimal solution before talking about its

optimality in a smaller subproblem.

So the question then is if we take capital S, the optimal solution,

we remove item n, in what sense is the residual solution optimal?

Put differently, for what kind of knapsack instance, if any,

is that an optimal solution for?

All right, so the correct answer is C.

So back in the independent set problem what we said is if we remove the right

most vertex, then what's left is optimal for the residual independent

set problem we get by plucking off the right most two vertices.

Here when we remove the nth item from the optimal solution S,

the claim is what we get is optimal for the knapsack problem involving the first

n-1 items and a residual knapsack capacity of W-w sub n.

So the original knapsack capacity with space reserved,

or deleted, for the nth item.

So before I give you a quick proof,

let me just briefly explain why a couple of the other answers are not correct.

So first of all, answer B, I hope you could rule out quickly.

It just doesn't type check.

So capital W, that's the knapsack capacity, so that's in units of size.

Little v sub n, that's the item's value.

So that's in dollars, so

it doesn't really make sense to talk the difference between those two.

They're just apples and oranges.

Part D, if you were worried about feasibility at any point.

So, if you take capital S and you remove the nth item, what have you done?

You have taken a set of items whose size was at most capital W by feasibility of s,

and you've removed an item with size wn from it.

So, what remains has total size at most capital W minus little w sub n.

So, S minus n wouldn't be feasible for

this reduced to this residual knapsack capacity W- w sub n.

A much more subtle point is part A, that's a very natural one to guess.

That turns out to not be correct.

So it turns out there might be smarter ways of using the first n-1 items,

than S minus item n, if you had a full knapsack capacity of W to work with.

So that's a subtler point.

And it's a good exercise for

you to actually convince yourself that A is wrong.

That there's no reason that when you take out item n from S and

you still keep using the original knapsack capacity that this has to be optimal.

That's not going to be true.

All right, so why is capital C correct?

Well, this is going to have the same spirit of case two of our weighted

independent set thought experiment.

So let me give you the proof.

The proof is going to be the usual contradiction analogous to case 2 of our

argument in the weighted independent set problem.

So suppose there was something better than S with n removed with the residual

capacity, W- wn.

Call that supposedly better solution S*, so what can we do to get a contradiction?

Well, let's just take S* which involves only the first n-1 items.

Let's add item n to it since S* has total size and most W-w sub n and

item n has size W sub n, the result has total size of W so

it's a feasible solution to take S* and extend it by item number n.

And if S* had more value than S with n removed,

then S* with n included has more value than S.

So for example, if S had total value 1,100,

100 of which was coming from the nth item, then S with n removed had value 1000.

If S* was better, it had a value 1,050.

Well then we just put n back in and it has value 1150 which contradicts the purported

optimal value of S* which had total value merely 1100.

So notice what's going on here.

So in taking away W sub n for the knapsack capacity before we look at the residual

problem we're in effect reserving a buffer for item n if we ever need it.

That's how we know we're feasible when we stick n back into the solution S*.

That's analogous to deleting the penultimate vertex of the path,

again as a buffer to ensure feasibility when we include the nth vertex back in

independent set problem.

So what was the point of this whole thought experiment which we've now

completed?

Well, again the point was to say the optimal solution,

whatever it is, it has to have only one of two forms.

We've narrowed the list of candidates down to two possibilities.

Either you just inherit the optimal solution with one less item in the same

knapsack capacity, or you look at the optimal solution with one less item and

less knapsack capacity by W sub n, and you extend that by item n.

But those are the only two possibilities.

So again, if we only knew which of these two cases were true.

If we only knew whether or not item n was in the optimum solution or

not, in some sense we could recursively compute the rest of the solution.

So just as that was enough to get us going with a dynamic programming algorithm for

weighted independent sets, so it goes with knapsack,

as I'll show you on the next video.