The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

来自 斯坦福大学 的课程

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

401 评分

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

从本节课中

Week 2

Kruskal's MST algorithm and applications to clustering; advanced union-find (optional).

- Tim RoughgardenProfessor

Computer Science

So let's start thinking about this alternative lazy union based approach to

the Union-Find data structure in some detail.

To get this to work as quickly as we want to we're going to need two optimizations.

This video will introduce the first of those optimizations, Union by Rank.

So let's have a quick slide making precise what we learned in the previous video.

So in our new implementation it's still the case

that each object has one extra field.

You can think of it logically as a pointer to some other object.

Really it's just recording the name of some other object but

the semantics have changed.

In our first implementation of union find with eager unions,

this field we called it a leader pointer and the invariate was that every

object's pointer points directly to the leader of its group.

In this Lazy Union Implementation, we are relaxing that variant.

So I'm going to change the name.

I'm not going to call it a leader pointer.

I'm just going to call it a parent pointer.

And the new relaxed invariant is that the collection of everybody's parent pointers

induces a collection of directed trees.

Each tree in this collection represents one of the groups

in the partition that we're maintaining.

Each of these trees has a root vertex,

that is a vertex which points back to itself.

It is these root vertices,

these root objects that we think of as the leaders of these components.

So as before, we're going to refer to a group by the name of its leader vertex and

that leader is, by definition, the root of the corresponding directed tree.

One way to think about things is that, when we had eager unions, we, again,

associated a directed tree with each of the groups, but we also insisted

that it was super shallow, bushy tree, that it had depth only one.

That is, all you had was a root, the leader,

and then everybody else who pointed directly to the leader.

In this lazy union implementation we're allowing the trees to grow deeper.

We don't insist that they have only one level below the root.

So as far as the initialization at birth in a union defined data structure just

consists of n degenerate trees each object initially points to itself and

is a root at the beginning.

In general how do you implement find from some object?

Well you just have to return the leader of its group and

we know you can get to the leader just by traversing parent pointers from x until

you get to a root until you get to some object that points back to itself.

That's what you return at the end of the find call.

How do you do a union?

Well given two objects x and y you have to find their respective roots so

you just invoke the find operation twice once for

x to get its root s1 once from y to get its root as two.

And then you install either s1 or s2 as a child of the other.

So you do one pointer update after the two finds.

So for now I'm going to leave Union under determined.

I'm not going to give you a rule to decide which of s1 and

s2 becomes the child of the other.

In the next quiz, I want you to think through what would be the worst case

running time of these operations if we make that choice arbitrarily,

if we just pick s1 s2 arbitrarily to become the child of the other.

All right so the correct answer unfortunately is the last one,

both find in union can actually blow up to linear time operations if you're

not careful about how you implement the union.

So why is this true?

Well first just let me point out that B is definitely not the correct answer.

Whatever finding union's worst case running time is,

it's definitely the same for both of them.

Remember, union reduces to calls to fine plus constant work.

So it's going to tell you that they're both going to have the same operation time

in the worst case.

Now, why is it linear?

The reason you're stuck with linear time is because the trees that you

build through a sequence of unions can in the worst case become very scraggly.

And to see what I mean imagine you just do a sequence of unions and

at each time you're unioning a group with just one object

into the group that you built up so far.

So if you union together a group of size two and group of size one,

we'll have you choose arbitrarily.

Maybe you make the group of size one the new root.

So now you just got a chain of the three objects.

Maybe you union in another singleton group,

and arbitrarily you choose the singleton group to be the new leader.

That gives you a chain of four objects and so on.

So if you do a linear number of unions, and you're not careful about who you make

a child of the other, you might well end up with a chain of linear length.

And then if you fines for the objects towards the bottom of that chain,

they're going to require linear work.

And again, union has a subroutine of fines so

its going to then take linear work as well.

So when you see this example, I hope your immediate reaction is well jeez,

we can certainly do better than that.

We can just be somewhat intelligent when we do a union which

of the old roots do we install as a child of the other one?

There's a rough analogy with how we made a natural optimization with our

eager union implementation of union find.

There we had to make some kind of choice between which of the two old leaders do

we retain?

We retain the one for the bigger group to minimize the number of leader updates.

So here, the natural thing to do is well look at what you did,

two trees is already deeper, and we want to keep the root of that deeper tree.

That's to avoid it from getting deeper still.

So we install the root of the shallower tree as a child of the deeper one.

Let's make that precise on the next slide with the notion of the rank of an object.

So the rank is just going to be a second field along with a parent

pointer that we maintain for each object.

It's just going to be some integer.

Let me tell you how I want you to think about the ranks at least for now.

Okay, so I'm going to give you some semantics, very natural semantics.

But I want to warn you, they're only going to hold valid for

the next couple of videos.

When we introduce something else called path compression these

semantics are going to break down.

But initially this is how I want you to think about ranks.

It's going to be the maximum number of hops required to get

from a leaf of x's tree up to x itself.

So for example if x is a root, this is going to be the worst case link,

the maximum link of any leaf to root path in the tree.

So just to make sure that this is clear let me draw a quick example.

So here in blue I've shown you one particular union fine data structure that

has two groups, now it's very easy to compute the rank of leafs, that's zero,

because the only path from a leaf to that node is the empty path.

There are two nodes that have rank 1.

Then the root of the bigger tree has rank 2.

And if you think about it, in general,

the rank of a node is going to be 1 plus the largest rank of any of its children.

For example, if you're in node X, you have a child Y and

their sum path from a leaf Z up to Y that requires five pointed traversals while

going from Z all the way up to U and X is going to require six pointed traversals.

And of course, when you initialize the data structure,

you want to set everybody's rank to zero, because everybody's just in a tree,

by themselves, at the beginning.

This notion of rank is going to be a super crucial concept throughout this entire

sequence of lectures on union fine so make sure you understand this definition.

I encourage you to draw out some of your own examples,

do some computations to get a better feel for the concept.

For the moment we're just going to use rank to avoid creating scraggly trees.

So let's go back to the bad example on the quiz.

Let's remember what the intuition was.

We wound up with this long chain and

therefore really bad worse case running time for our operations

because we were stupidly taking a tree that was already pretty deep and

installing it as a child under a single note under a super shallow tree.

Thereby producing a tree which is even deeper.

So the obvious way to avoid that is when faced with merging together a deep

tree and a shallow tree,

you install the shallow tree as a child under the deep one.

That's going to slow down this deepening process as much as possible.

So we can make that precise just to using this rank notion.

Notice that the rank is exactly quantifying the depth of a tree.

So the rank of the root of a tree by this invariant

is equal to the depth of the tree.

So if you want to make the shallower tree the child of the other one, that means you

want to make the smaller rank tree the child of the bigger rank tree.

This optimization is what is meant by Union by Rank.

This still remains the case where the two roots have equal rank, that is where

the two trees that were fusing have exactly the same maximum path length.

And in that case we do just arbitrarily choose one of them.

So in the pseudocode I've written here I've arbitrarily chosen y's root to be

the one that remains the root, x's root s1 is going to be installed as a child of s2.

But again, you can do it either way it's not going to matter.

Now we are not off the hook yet.

Remember this is a data structure where we're intending for

a invariant to hold these semantics for the ranks that's quantifying

worst case path link to the node and we made a modification to the data structure.

We rewired somebody's parent pointer.

So now it's our responsibility to check does the invariant still hold and

if doesn't still hold,

then we have to restore it hopefully using minimal extra work.

So in this quiz we're going to think through how ranks

change when we do a unions.

Remember the intonation, we're in the middle of a union of the objects x and y.

s1 and s2 refer the ranks that is the leaders of the groups of the trees that

contain x and y respectively.

The first thing I want you to think through and

convince yourself of is that for objects other than s1 and s2, other than the pair

of objects that are involved in the actual point of re-wiring, nobody's rank changes.

So make sure you think that through and convince yourself.

Ranks are invariant, except possibly for the objects S1 and S2.

In this quiz, we're going to drill down on how do the ranks of S1 and S2 change given

that we took one of them and rewired it's parent pointer to point to the other.

Okay so the correct answer to this quiz is the fourth one.

So remarkably in many cases there's no change at all to anybody's ranks including

S1 or S2.

If the ranks were different before the union both of the ranks stay the same.

The one exception is when you take the union of two ranks which are equal.

Then whichever one is the new root and

in the pseudo code that I showed you s2 is chosen to be the new root.

Its rank gets incremented, it goes up by one.

I think the easiest way to see this is is just with a couple pictures,

couple of examples.

So let me just draw two directed trees in the corresponding ranks.

And in this one s1 has strictly bigger rank.

It's rank is 2 so we install s2 as a child under s1.

So re-computing the ranks in the fused together tree, we see that again the rank

at the route at s1 is 2., same thing as it was before it didn't go up.

So, what's the general principle here?

Well, in general, suppose you have these two trees and

suppose the first ones route s1 has a bigger rank, say rank r.

What does that mean?

So, that means that there exists a path from a leaf to the root

s1 that requires a full r hops.

And in the second tree by virtue of its rank being less than r most r minus 1.

It says that every single path from a leaf to the root s2 of that

tree uses only r minus 1 hops.

So when you install s2 as a child under s1 the length of every leaf to root

path now the root is S1, it's only gone up by 1.

You just have to get to s2 and then 1 more hop you make it all the way to s1.

So if all those paths took almost r minus 1 before,

all the paths all the way to s minus 1 to s1 now, taken most are hops now.

So the worst case path link has not gone up.

That's exactly what happens in this example so it's true that when we install

s2 under s1 we have yet another path that requires two hops that goes through s2.

But we had one previously so the rank doesn't go up.

The worst case path length is exactly the same as before.

And this same kind of reasoning explains why the rank has to get bumped up by one

when you fuse together two trees that had the same rank.

So suppose S1 and S2 both have rank R,

well then both of them have a leaf to root path that requires a full R hops.

And now when you install S1 as a child under S2 that path where you needed R hops

to get from the leaf to S1, now to get all the way up to S2 you need one more hop.

So now suddenly there exists a path from a leaf to the new root S2 that

requires a full R plus 1 hops that's why the ranks gotta go up.