larger tree. So, we, we avoid this first situation here where we put the larger

tree lower. In the weighted algorithm, we always put the smaller tree lower. How we,

let's see how we implement that. Let's see a demo first. Okay, so again start out in

our normal starting position, where everybody's in their own tree. And for

when there's only two items to link it, it works, works the same way as before. But

now, when we have eight to merge with four and three, we put the eight as the child,

no matter which order their arguments came, because it's the smaller tree. So,

six and five doesn't matter, whichever one goes down doesn't matter. Nine and four,

so now, nine is the small one four is the big one. So, nine is going to be the one

that goes down below. Two and one, five and zero. So now, five and zero five is in

the bigger tree so zero goes below. Seven and two, two is in the bigger tree so

seven goes below. Six and one they're in equal size trees. And seven and three,

three is in the smaller tree so it goes below. So, the weighted algorithm always

makes sure that the smaller tree goes below. And again, we wind up with a single

tree representing all the objects. But this time, we h ave some guarantee that no

item is too far from the root and we'll talk about that explicitly in a second.

So, here's an example that shows the effect of doing the weighted quick union

where we always put the smaller tree down below for the same set of union commands.

This is with a hundred sites and 88 union operations. You can see in the top the big

tree has some trees, some nodes, a fair distance from the root. In the bottom, for

the weighted algorithm all the nodes are within distance four from the root. The

average distance to the root is much, much lower. Let's look at the Java

implementation and then we'll look in more detail at, at that quantitative

information. So, we used the same data structure except, now we need an extra

array, that for each item, gives the number of objects in the tree routed at

that item. That will maintain in the union operation. Find implementation is

identical to for quick union, you're just checking whether the roots are equal. For

the union implementation, we're going to modify the code to check the sizes. And

link the root of the smaller tree to the root of the larger tree in each case. And

then after changing the id link, we also change the size array. If we make id, i a

child of j, then we have to increment the size of j's tree by the size of i's tree.

Or if we do the other way around, then we have to increment the size of i's tree by

the size of j's tree. So, that's the full code in white for implementing quick

union. So, not very much code but much, much better performance. In fact we can

analyze the running time mathematically and show that defined operation, it takes

time proportional to how far down the trees are in the node in the tree, the

nodes are in the tree, but we can show that it's guaranteed that the depth of any

node in the tree is at most the logarithm to the base two of N. We use the notation

Lg always for logarithm to the base two. And, and, so for, if N is a thousand,

that's going to be ten, if N is a million that's twenty, if N is a billion that's

30. It's a very small number compared to N. So, let's look at the proof of that. We

do some mathematical proofs in, in this course when they're critical such as this

one. And why is it true that the depth of any node x is, at most, log base two of N?

Well, the key to understanding that is to, take a look at exactly when does the depth

of any node increase? When does it go down further in the tree? Well. The x's depth

will increase by one, when its tree, T1 in this diagram, is merged into some other

tree, T2 in this diagram. Well, at that point we said we only do that if the size

of T2 was bigger than the or equal to size of T1. So, when the depth of x increases,

the size of its tree at least doubles. So, that's the key because that means that the

size of the tree containing x can double at most log N times because if you start

with one and double log N times, you get N and there's only N nodes in the tree. So,

that's a sketch of a proof that the depth of any node x is at most log base two of

N. And that has profound impact on the performance of this algorithm. Now instead

of the initialization always takes time proportional to N. But now, both the union

and the connected or find operation takes time proportional to log base two of N.

And that is an algorithm that scales. If N grows from a million to a billion, that

cost goes from twenty to 30, which is quite not acceptable. Now, this was very

easy to implement and, and we could stop but usually, what happens in the design of

algorithms is now that we understand what it is that gains performance, we take a

look and see, well, could we improve it even further. And in this case, it's very

easy to improve it much, much more. And that's the idea of path compression. And

this idea is that, well, when we're trying to find the root of the tree containing a,

a given node. We're touching all the nodes on the path from that node to the root.

While we're doi ng that we might as well make each one of those just point to the

root. There's no reason not to. So when we're looking, we're trying to find the

root of, of P. After we find it, we might as well just go back and make every node

on that path just point to the root. That's going to be a constant extra cost.

We went up the path once to find the root. Now, we'll go up again to just flatten the

tree out. And the reason would be, no reason not to do that. We had one line of

code to flatten the tree, amazingly. Actually to make a one liner code, we use

a, a simple variant where we make every other node in the path point to its

grandparent on the way up the tree. Now, that's not quite as good as totally