The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

来自 斯坦福大学 的课程

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

400 评分

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

从本节课中

Week 2

Kruskal's MST algorithm and applications to clustering; advanced union-find (optional).

- Tim RoughgardenProfessor

Computer Science

I need to begin by setting up some notation and definitions.

The first one is that of rank blocks. So if we're given value of N, the number

of objects in the data structure we define the rank blocks as follows.

0 gets its own block as does 1. The next block is 2.

3. 4.

The next block starts at 5 obviously and the last entry is going to be 2 raised to

the fourth. So 2 raised to the final.

The rank of the previous block as known as 16.

The nest block starts at 17 and goes up to 2 raised to the last rank of previous

blocks. So 2 raised to the 16 also known as

65536, and so until we have enough rank blocks that we capture.

N. Actually, if you think about it, because

ranks can only be log n, we only have to go up to log n, but that actually doesn't

affect anything by more than a constant. So let's just go ahead all the way up to

So, how many rank blocks are there? Well, focus just on the largest member of each

of the rank blocks. The largest member of a given rank block

is the largest member is 2 raised to the largest member of the previous rank

block. So, for the t-th rank block, roughly what

you have is this largest entry is 2 to the 2 to the 2.

Raised t times. How many times you need to do that before

you get in? By definition, you need log*(n).

So, that's how many rank block star, that's where log*(n) enters the analysis.

So, I realized this must be a very inscrutable definition, these weird rank

blocks. So, let me tell you how you should think

of about them. They're meant to encode an intuition that

we had on the previous slide where we said well, you know we should be happy if

the rank of an object's parent is a lot bigger than the rank of that object

itself. Why? Well, when we traverse that object's

parent pointer, we make a ton of progress through the rank space and you can only

make a ton of progress through the rank space a limited number of times.

So that means a small number of parent traversals, that means fast fines.

So these rank blocks are just a very clever way of defining sufficient

progress. When we're going to be happy with a gap

between the rank of an object and the rank of its parent.

Specifically we're happy whenever these 2 ranks lie in different.

Rank blocks. We're not happy if they lie on the same

rank block. So, for example, if you're at an object

and it has rank 8, and together with his parents, it has rank 18, then we're happy

because 18 is in the rank block after the one that contains 8.

On the other hand, if you go from rank 8 to rank 15, then we're not happy, okay?

We're going to call that not a big gap Between the rank of the object and the

object's parent. So let's build on that idea to make

another definition. So consider a given snapshot in time.

That is, we've done some sequence of finds, we've done some sequence of

unions. So, we're going to call some objects at

this point in time good and some of them bad.

Here are the criteria by which you are good.

So first of all, if you're the root of your tree, you're going to be good.

If you're a direct descendant of the root, that is if your parent is the root

of your tree, then you're also good, If not though,if you're deeper in the tree,

then you're going to be good only if your parents ranked Rank is in a strictly

larger ranked block, than your own rank. So, that is in the previous example.

If your rank is 8, and your parents is 18, than you're good.

If your rank is 8, and your parents is 15 and your parent's not a root, then you're

going to be bad. The role of this definition is to split

the work that we do into two parts. First of all, we do some work visiting

good nodes. Second of all, we do some work vising bad

nodes. The work we do visiting good nodes would

be very easy to bout, no problem. We'll have to do a separate global

analysis to bound the total work that we do visiting bad nodes.

This dichotomy is exactly the same as something we faced when analyzing

Kruskal's algorithm when we implemented it using the union 5 data structure with

eager unions. Here, you recall there were some parts of

the work in Kruskal's algorithm that were easy to balance, just iteration by

iteration. For example every cycle check cost only

at constant time but then there is this more complicated type of work, namely all

of leader pointer updates that we had that bound globally via separate

argument. The exact same thing is going to happen

here. Good nodes will be able to bound

operation by operation whereas the bad nodes will have a global analysis control

the total work done over all of the operations.

So more precisely, I've set up the definitions so that the work done

visiting good nodes is bounded by O(log* n) every single operation.

Indeed, how many good nodes could you possibly visit during a find operation?

Say, from some object x. So you start at x, and you traverse these

parent pointers going all the way up to the roots.

Well, there's the roots. There's the descendant of the roots, so

that's 2. So let's set those aside.

What about the other good nodes that you encounter on your path? Well, by

definition, when you visit a good node. The rank of its parent is on the bigger

rank block than the rank of that node itself.

That is, every time you traverse a parent pointer from a good node, you will

progress to a subsequent rank block. Well, assuming that log star n rank

blocks, so you can only progress through subsequent log star n times.

So, the total number of good nodes you're going to see is O(log*n).

So now, let's go ahead and express the total amount of the work that we do

overall to find a union operations, and use two quantities.

Work on the good nodes, which we now know as just log star in each operation.

Plus, the visits to the bad nodes, which at the moment, we have no idea how big it

is. So now, let's proceed to the global bound

on the total amount of work that we performed on bad nodes and this is really

the crux of the whole theorem. [SOUND] So, let's just review the

definitions real quick. What is that mean that you're a bad node?

So, first of all, you're not the root Second of all you're not a direct

descendant of the root. That is you have a grandparent.

And third, it is not the case that your parents' rank is in a later rank block.

Is in exactly the same rank block as your own rank.

That's what it means that your. So how much work do we spend on bad

nodes? Well let's analyze it one rank block at a time.

So, fix an arbitrary rank block, let's say for some integer K, it's smallest

rank is K + 1 and it's biggest rank is 2 to the K.

Now I told you what our two main building blocks were.

First 1 is the rank lemma, I am going to ask you to remember that in a second.

But first I want to put to use our other building block, which is that path

compression increases the rank gap between objects and their parents.

That's what we're going to use right now. Specifically, consider a fine operation

and a bad object x that it visits. By virtue of being bad x is not the root

and is not a direct descendant of the root.

So, root is a higher up ancestor than its parent.

Therefore, x's parent will be changed In the subsequent path compression.

It will be rewired to point directly to the root, a strict ancestor of its

previous parent. Therefore, the rank of its new parent

will be strictly bigger than the rank of its previous parent.

That's going to keep happening every time that x is visited, y looks bad.

It keeps getting new parents, and those new parents keep having ranks strictly

bigger than the previous one. Well, how many times can this happen

before the rank of x's parent has grown so big that it lies in a subsequent rank

block? The biggest value in x's rank block and remember x is a nonroot its

rank is frozen forever. So it's always stuck in this rank block.

Once its parents rank gets updated let's say at least 2^k times then the rank has

to be so big that it lies. In the next rank block.

At that point, x is no longer bad. Its parent pointer makes so much

progress, it goes to another rank block. Now, we're going to call it good.

And, of course, once x becomes good in this fashion, it would be good

forevermore. It is not a root, it will never be a root

again. Its rank is frozen forever and its

parent's rank can only go up. So, once you're good, once your parent's

rank is sufficiently large. It will be sufficiently large for the

rest of time. Alright, so we're almost there.

Let's just make sure we don't forget anything that we have done.

So, the total work we're bounding in this two ways.

So, first of all, we visit log star n good nodes for each operation.

So, overall m operations at O(m). log*n for the good nodes, plus there's a

number of visits over the m operations to the bad nodes.

We're going to bound that wrok globally but we're going to proceed rank block by

rank blocks. We're fixing one ranks k+1 up to 2^k.

What we shown on the last slide is that for every object x whose final frozen

rank lies somewhere in this rank block. The number of times it gets visited while

it's bad, the number of times it could be visited before it becomes good forever

more is bounded above by 2^k. So, if now you used one of our key

building blocks, that path compression increases the ranks of parents, now let's

use the other building block, the rank lemma.

The rank lemma, if you recall, says, that in any given moment in time and for any

possible rank r, the number of objects that currently have rank r cannot be any

bigger than n/2^r. So, it's used the rank lemma and apply it

to the upper bound, how many nodes Could possibly have their final frozen ranks

lengths somewhere in this rank block. They have their final frozen ranks

somewhere between k+1 and 2^5. So, let's just sum over all of the ranks

in the rank blocks, so it's starting at 2k+1 going up to 2^k.

By the rank lemma, for a given value of i, we noticed the most n/2^i objects that

eventually wind up with rank i and by the usual geometrical sum, this whole thing

can be upper bounded by n / 2^k. So, this is now just trying to look like

a magical coincidence. Of course, we made a number of

definitions in the analysis. specifically, we structured the rank

blocks so that this magic would happen. Specifically, the number of inhabitants

of a rank block, n/2^k, times the maximum number of visits to an inhabitant in the

rank block wile they're bad, times 2^k, that's actually independent of the rank

block. We multiply these two things together,

the number of visits per object to the k, the number of objects n/2^k and what do

we get? We get n. Now, this was only counting the visits to

bad objects in a given rank block but there aren't many rank block, remember,

there's only log*n of them. So, that means the total amount of work

spent visiting the bad nodes, some that over the rank blocks is O(m*log*n).

Combining the balance of the good nodes and the bad nodes, we get (m+n) log*n.

At the beginning, I mentioned that the interesting case is when m is big Omega

of n, in that case, this bound is just O(m*log*n).

Essentially, if you have a very sparse set of union operations, you can just

apply this analysis to each of the directed trees separately.

So, that's it, that's the full story of complete analysis of the brillaint

Hopcroft only analysis. Log star and on average operation time,

under calf compression. Brilliant as it is, you can do even

there. That's the subject of the next couple

videos.