The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

来自 Stanford University 的课程

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

666 个评分

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

从本节课中

Week 3

Huffman codes; introduction to dynamic programming.

- Tim RoughgardenProfessor

Computer Science

To make sure that Huffman's greedy algorithm is clear,

let's go through a slightly larger, more complicated example.

So let's work with a six character alphabet.

Let's call the letters A, B, C, D, E, F, and let's assume

that we're given the weights 3, 2, 6 and 8, 2, 6 for these six characters.

Remember, this problem is well-defined even if the weights don't add up to 1.

If you prefer working with actual probabilities,

feel free to divide these six numbers by 27.

In the first step of Huffman's greedy algorithm,

we find the letters that have the smallest weights, the smallest frequencies.

So in this example, that would be the letters B and E,

both of them have weight 2.

Now what we do is we merge these two letters into a single meta letter,

in effect committing right now to having B and E be siblings in the final tree.

After this merger, our alphabet is down to five symbols, the symbols B and

E being replaced by the merged symbol BE.

And the weight of BE is the sum of the weights of B and E, namely, 4.

We can imagine our tree slowly taking shape through these iterations.

So after step 1, we know that B and E are going to be siblings and we know that just

A, C, D, and F are going to be leaves, that's all we know thus far.

In the next iteration, we again look for

the two symbols that have the smallest weight.

And here the smallest weight symbol is A.

It has weight 3.

And the runner-up is the merged symbol B sub E.

Its combined weight is 4, and that's second overall for these five symbols.

So in this step, we merge A with B E.

Now our alphabet is down to four symbols, the merged symbol, A, B, E,

which has cumulative weight 7, and then the original symbols, C, D, and F,

which have their original weights, 6, 8, and 6.

As far as our tree, we've now committed to the symbol A appearing as an uncle

of the siblings B and E.

And again, C, D,

and F, we just know they're leaves somewhere in the final tree.

In step three,

we're going to again pick the two symbols that have the smallest weights.

In this case, the two symbols with the smallest weight are C and

F, each of weight 6.

In our new alphabet, we still have our symbol ABE, it still has weight 7.

We still have the symbol D, it still has weight 8.

But now we have a new merged symbol CF, and its new weight is 12.

As far as our tree, in addition to the information we already knew,

we're now committing to having C and F be siblings in the final tree.

In step four, we merge the two symbols with the smallest weight.

So that would be ABE with its weight of seven and D with its weight of eight.

So this leaves us with only two symbols, ABDE and CF.

And now we know what both of these subtrees of the root of the final tree

are going to look like.

Now that we're down to two symbols,

the only thing we can do is fuse these two symbols into one,

fuse these two subtrees into a single one by uniting them under a common root.

That gives us the following final output of Huffman's algorithm.

What prefix-free code does this tree correspond to?

Well, as usual, let's label all of the left branches with zero and

all of the right branches with ones.

And now, as usual, the encoding of a character is just the symbols of zeros and

ones that you see when you traverse a path from the root to that leaf.

So for example, A will be encoded with 000,

B with 0010, C with 10, D with 01,

E with 0011, and F with 11.