So, in this video, we're going to prove Tarjan's inverse Ackermann bound in the

performance of the union find data structure with union by rank and path

compression. Frankly, I hope you're kind of excited.

This is one of the crown jewels in the entire history of algorithms and data

structures. Let me remind you what the bound says.

You've got a union-find data structure, we're doing lazy unions, we're doing

union by rank, we're doing path compression.

And for an arbitrary sequence of m find a union operations. The total amount of

work you do over the sequence of operations is bounded above by m times

alpha of n, where n is the number of objects in the data structure and alpha

is the inverse Ackermann function that we defined in the previous video.

I want to give a shout out here to Dexter Kozen's beautiful book the Design and

Analysis of Algorithms. It is his analysis that I will be

following closely here. So, it's true that merely stating,

Tarjan's bound is non-trivial. We have this entire video defining the Ackermann

function and it's inverse so we could make sense of the statement.

That said, we're well positioned to approve Tarjan's bound.

In fact, the template of the proof is going to very much mirror what we already

did for the log star analysis by Hopcroft and Ullman.

So in that spirit, let's review what were the two basic workhorses, the two

building blocks that drove forward the Hopcroft-Ullman analysis.

So the first building block is the rank Lemma, which dates back all the way to

the videos pre-path compression. And remember the rank Lemma gives us

control on how many objects can exist with a given rank.

And in particular, that upper bound is decreasing exponentially with the rank,

it's n/2^r objects at most with the given rank r.

But, of course, we need a second building block to exploit the facts that we're

using the path compression optimization. So, how do we quantify the progress made

by path compression? Well, we argue that every time a node has

its parent pointer updated in path compression, it acquires a new parent

with strictly larger rank. So, you can think of the Hopcroft-Ullman

analysis as a clever and essentially optimal exploitation of these two

building blocks. How did we use them? Well, we define the

notion of rank blocks and each rank block has size to raise to the size of the

previous rank block. Why were rank blocks useful? Well, they

allowed us to measure progress as we followed parent pointers.

We defined an object to be good, if following it's parent, catapulted you

into a subsequent ranked block. Because there's only a log star number of

ranked blocks, you can only visit log star good nodes on any given find

operation. So, that gave us a per operation bound on

the log star for visits to good nodes. And then, for visits to bad nodes, we use

the second building block to argue that every time you visit a bad node, the rank

of its parent increases. And that can only happen so many times

before the rank of this object's parent is so much bigger than this object's rank

itself that the node has to be good. You have to make a lot of progress by

following its parent pointer. And the point is, if we want to do better

than the Hopcroft-Ullman analysis, we're not going to do better by taking these

same two building blocks and exploiting them in a better way, that can't be done.

So, what we need to do is have a better version of one of the two building

blocks. So, the key idea in Tarjan's analysis is

to have a stronger version of the second building block.

We're going to argue that path compression doesn't merely increase the

rank of nodes parents by one, but in fact, it increases typically the rank of

an object's parents by a lot. And what's kind of amazing is the link of

the proof we're about to do is really basically the same as that in the

Hopcroft-Ullman analysis. And the steps match up almost perfectly.

So, the bound is even better, the argument is even more ingenious.

It's even harder to imagine how one would come up with this idea oneself. But, as

far as checking it, as far as understanding it, the complexity level is

roughly the same as what we've already done in the log star analysis, so here we

go. So, in this slide, I'm going to give you

a definition. And the point of the definition is to

measure how much progress we make in terms of ranks when we follow a node's

parent pointer. So, the definition here is going to play exactly the same role

that the definition of rank blocks played in the Hopcroft-Ullman analysis.

So, what I'm going to define is a number, a statistic measuring the difference, the

gap, between the rank of an object and the rank of its parent. So, this

definition is only going to make sense for non-root objects x.

One thing I want you to remember from previous videos is that once an object is

not a root, it will never again be a root, and it's

rank will never again change. So, non-root objects have rank fixed

forevermore. So, we're going to use the notation delta

of x for the statistic. And the definition of delta of x is the largest

value of k such that a sub k, this is Ackermann function remember, such that a

sub k applied to the rank of this object x, is bounded above by the rank of x's

parent. So, the bigger the gap between the rank

of x's parent and x's rank, the bigger the value of delta of x.

So, let me mention, talk through some simple examples to make sure this makes

sense, and also review the Ackermann function a little bit.

So, first of all, let me just point out the delta of x is always non-negative for

any non-root object x. So, why is this true? Well remember, and

this goes all the way back to our union by rank discussion, the rank of an

object's parent is always at least one more than the rank of that object.

And the function a sub 0, recall we defined as the successor function.

So, for every object that's not a root, it's always true that it's parent has

rank at least one more than it. And that means we can always at least

take k to be zero and have the inequality be satisfied.

Now, when is it going to be true that an object x has a delta value at least equal

to one? Well, that is equivalent to stating we can take k to be 1 in this

inequality, and the inequality holds. So, let's remember what is the function a

sub one. In the previous video, we discovered that

was just the doubling function. So, we can take k=1 and have this

inequality satisfied if and only if the rank of the node's parent is at least

double the rank of that object. Similarly, an object x has a delta value

equal to at least 2 if and only if we can take k=2 on this right-hand side of the

definition and have this inequality be satisfied.

So, let's recall what the function a sub 2 is.

a sub 2 of r is just r*2^r. So, an object x has delta value equal to

2 if and only if its parent's rank is substantially bigger than its own rank,

in the sense that if it has rank r, its parent's ranks has to be at least r*2^r.