0:02

Now, we'll look at red black BSTs which is a simple data structure that allows us to

implement 2-3 tree with very little extra code beyond the basic binary search tree

code. So this is, and actually the version that we're going to, looking at is called

left-leaning red-black BSTs. On a personal note, I wrote a research paper on this

topic in 1979 with Leo Givas and we thought we pretty well understood these

data structures at that time and people around the world use them in implementing

various different systems. But just a few years ago for this course I found a much

simpler implementation of red-black trees and this is just the a case study showing

that there are simple algorithms still out there waiting to be discovered and this is

one of them that we're going to talk about. So, the idea is that we are, are

going to represent every two, three tree as a binary search tree. And in order to

get that done, we just need a simple representation for three notes. So, what

we are going to do is use internal left-leaning links to glue the three nodes

together. So, the larger of the two nodes in the tree node will always be the root

of a little binary search tree of size two for the three node and the link between

the two nodes, the left link that links the larger key to the smaller one we'll

color red. And that's to distinguish those links from the other links in the binary

tree so that we can tell when we're inserting things which nodes belong to

tree nodes and which ones don't. And you can see from this transformation that it's

easy to perform this, see this correspondence the middle link between A

and B, those are the keys that are less than B and larger than A. So, that takes

two comparisons to get to them the ones that are less than A, less than, less than

A, that's two comparisons for that and the ones that are greater than B are the right

link of B. So, just following those three cases, I see t hat this correspondence is

going to work. So, any 2-3 tree corresponds to a left leaning red-black

BST in this way. Just take the three nodes and split them into little binary search

tree of size two held together by a red link. And correspondingly given a

red-black BST then you can get the 2-3 tree if you wanted it. But just look at

the properties of looking at the properties over a left leaning red-black

BST. You know, with reference to what we know about 2-3 trees. First of all no node

has two red links connected to it cuz the only red links are internal to three

nodes. And those have to have ex, external links or tree links connecting them to

some other node. Every path from the root down to a null link has the same number of

black links that just follows directly from the corresponding property for 2-3

trees. A left-leaning red-black BST has perfect black balance. And all the red

links lean left. So, given a BST with some of the links colored red that has those

properties that's going to correspond to a 2-3 tree. And that's a key property is

this one-to-one correspondence between 2-3 trees and left-leaning red-black trees.

Given a 2-3 tree, we saw how to do it. Given a, a, given a red-black tree, we

just make the red links horizontal, and merge the nodes together to be three

nodes. So, all of the operations that we're going to look at for red-black trees

can be understood in terms of the corresponding operations on 2-3 trees. Now

the first, and really one of the most critical observations, is that search in a

red-black BST is exactly the same as for an elementary BST, we just ignore the

color. Now, it's going to be much faster because of better balance in the tree, but

in terms of the code, we don't have to change the code at all. Our regular search

code doesn't examine the color of a link and so we can just use it exactly as is.

And in fact, most of the other operations that we implemented on BSTs are also

identical. They d on't need the colors, but they can all benefit from the fact

that the trees are much better balanced. So this aspect of red-black BSTs is an

extremely nice one because of the operations that we implemented for regular

BSTs that involves some complicated code for floor and ceiling and rank and so

forth, and we don't have to change that code at all. We just during the insertion,

make sure that we, we [cough] maintain the properties the balance properties and by

doing that, we wind up with balance trees and we make all the operations quick and

we don't have to re-implement, we don't have to change it at all. So first before

we get to the code for insertion, we have to look at the representation. We don't

actually have explicit representation of links or links in trees or just references

to nodes. You could implement this by building explicit links but the an easier

thing to do is to know that every node is referenced by just one link in a tree the

one from it's parent. So, you can put the color of a link in the node that it

references. So, in this case we have a red link connecting E and C. What we do is put

a bit, a color bit in each in the node class. And then, if the link coming into a

node is red we set that to true. So this simple thing just tests is a node red. We

consider all null nodes to be black null links to be black, we don't have red links

dangling off, that would be incomplete pre-nodes. And [cough] otherwise if the

color's red, we return true otherwise return false to test whether a node is

red. So in this tree, the color of h.left is red, the color of h.right is black and

so forth. So, that's the way we represent colors by putting the, a color bit in the

node for the color of the length that points to it. [cough] Alright, so now,

there's a couple of elementary operations that we have to perform on red-black

trees, called rotations. And the idea is that during the construction of a tree, or

during an insertion operation, sometimes we wind up with red links that are leaning

in the wrong direction. And so we'll need what's called a left rotation, and the job

of that operation is to take a, a right leaning red link that is there for

whatever reason and reorient it to lean to the left. [cough] so, in this case, we

have the right link of E points to S and S is red so that's a right-leaning red link

and so now that's the before and what we want to do is reorient things so that it

leans to the left. And again, that has to be a local operation that only changes a

few links and just from the diagram it's not difficult to see that this little bit

of code will do the job. If we start with a right-leaning red link. So, first thing

we do is take the reference of h.right, and save that in x. So, that's the node

that's going to be the new root of the three nodes so to speak. And, and then,

x.left after the rotation is going to be h. And also whatever color h was, well, it

looks like it should be black. But actually this situation is where it could

be red. Then x is going to have that color cuz the link coming into h is going to be

the link coming into x. And then h's color is going to be black afterwards. And then,

we return x to link further up the tree which happens during our standard

recursive insert. So, that's a rotate left operation. Now, the property of this

operation that's very important is it maintains a symmetric order. The keys

between E and S are still there. We just changed the way we get to them. And the

keys less than E and so forth. And also we maintain perfect black balance because we

didn't change the black height, height of anything by doing this transformation. All

those subtrees, those three subtrees are exactly the same relative to the top and

bottom of the tree, as they were before the rotation. Now paradoxically and you'll

see why very soon it also turns out that to get the insertion done properly we

sometimes need to take a left-leaning red link and temporarily make it lean right.

And later on, we'll get it back to the left again. But anyway, that's a basic

operation that we sometimes need. And so that's just the symmetric code to the code

that we just did. Now again, now x is h.left. And h.left is going to be x.right

after the rotation. X's color is still going to be h's color. And h's color is

going to be red. And the right rotation implements this and again that's going to

maintain a, a symmetric order in perfect black balance we change the way the red

goes but we didn't change anything about the black. Okay, that's a right rotation.

Now, here's the third elementary operation that we're going to perform. It's called a

color flip. Sometimes during the insertion, we might wind up with a node

that's got two red links coming out of it. That's corresponds precisely to our

temporary four node when we're doing 2-3 trees. And what we wanted to do with the

temporary four node was to split it and pass the center node up to the root. Well,

you can see from this structure that we're all set to do that all we need to do

actually is not change any links, just change all the colors. And so, that is, we

change the link from E to A and from E to S to be black. That essentially splits the

four node. And then, we want to insert the E into its parent. And we just do that by

changing its link to be red. So, that's flipping the colors. And that's the way we

split a temporary four node in a left-linear red-black tree. And again,

that's just flipping colors. It doesn't change any links so it still, of course,

maintains symmetric order and perfect black balance. So, those are the three

basic operations we're going to use. Rotate left, rotate right and flip colors.

So, the basic s trategy is, with those operations, maintain one-to-one

correspondence with 2-3 trees when we do insertions. So, here's an example. If we

want to insert C into this red black tree which is a representation of this 2-3

tree, then C is going to be less than E greater than A. So, it will get added to,

as the right link of A and every time we add a node we just create a red link to

its parents and so, that's changing the two node into a three node. In this case

it's three nodes that's oriented the wrong way so we need to do a left rotate. After

we do the left rotate, we have a legal left-leaning red-black tree, and it

exactly corresponds to that 2-3 tree, so the insertion of C gives us exactly what

we would want, that correspondence with the 2-3 tree. We have to work through

other cases that can arise but there's not too many so we'll work through and we have

the basic operations, left rotate, right rotate, and flip colors. Alright, so

first, warm up, insert into a tree with exactly one node. Well, if it goes on the

left, then we just make a red link and add it on then we're done. If it goes on the

right, then we attach a new node with the red link on the right but we have to

rotate it to the left to make a legal three node. So, that's inserting to a tree

with the one node and make it a tree with two nodes. And that one generalizes to

help us insert into a two node at the bottom. So, we do the standard BST insert

color the new link red and then if that [cough] new three node happens to lean

right, rotated to the left. That's the case that we just did. So now, let's look

at the second warm-up. So, say, we have just two nodes in the tree, so it's we

have two nodes and that means it's a single three node. Then there's three

cases. So, one is that the new one is larger than both of the keys. If that's

true, then we attach the new node with the red link as always. And that gives us a

temporary four node. And what we want to do is split that four node and in this

case, since we are at the root that's all so that just flips the colors. Now, the

color of the root in our code will temporarily turn red and then we turn it

black again. So, that's inserting into a tree that's a single three node a node

that's larger than both of them, a key that is larger than both of them and we

get wind up with a four node. Well, let's look at the other two cases and these

understanding needs is crucial to understanding the whole algorithm. Let's

say, the new key is smaller than both of the keys in our three node. Now, we attach

a new link at the left of the smaller node. And now, we've gotta find BST. But

it has two red links in a row. And that's something that's not allowed. So, what

we're going to do is we're going to rotate the top link to the right. So that puts B

at the root. And now, it's got two red children. It reduces to this case. And we

flip the colors and we have a single four node. Sorry a, a red black. A tree that's

got three two-nodes and no red links so same situation as before. So, we had a

single temporary four note and we split it up into a two, two note not connected to a

four note. And then, so that's the case when it's smaller. Now, we have to look at

the third case, which is, when it's, the new node inserted this in between and

comes out of this link here. Again, we just had a red link and now we have a BST

with two red links along the path connected to A and that's not allowed. In

this case it's a bit trickier to affix the situation, what we do is we rotate the

bottom link left. So, and that gives us this and reduce it to the other situation

that we had before. And then we rotate the top link right and then, we flip the

colors. So, this one we used all three of our operations, rotate left rotate right

and flip the colors. And that gets us an insertion into a tree that has from a tree

that i s a single three node to a tree that is three two nodes that is containing

three keys. So that sort of operation is going to work in a big tree when we insert

into a new three node at the bottom. We do the standard BST insert, color the new

link red, and we do the rotations that we need, either one or two rotations to

balance the temporary four node, and then we flip colors to pass the red link up one

level and then remind me to rotate to that to make that one lean left. So, for

example if we insert h in to this tree here, it comes off as the left link of R

so that gives us a temporary four node that's not balanced so we need to rotate

the link from S to the right and that gives us now temporary four node that is

balanced and again, these are all local transformation it's not changing the rest

of the tree. Now, we flip colors and that gives us a, a good red-black tree, except

that, that one red link that we just is leaning the wrong way. So, now we need to

rotate left and then once we've done that, now we have a legal left-leaning red-black

tree. So [cough] that's a insertion into a three node at the bottom. So, here's

another one that involves, remember, we passed that red link up. There might, if

that gets passed up to a three node, then we have to continue moving up the tree and

just treat it in the same way as we just treated inserting at the bottom. We have a

new red link appearing into some three node. There's the three cases that could

happen and here's an, an a, an example. So, say, we're inserting P into this

left-leaning red black tree it goes to the right event so we get a temporary four

node that's got two red links both children are red in that thing so we want

to flip the colors. We flipped the colors and now our temporary 4-node is up higher

in the tree but it's not balanced so we are going to have to do two rotations to

make that balanced. First one is to make the bottom link left-leaning and then the

second one is to make the top link right-leaning so that we can have the

temporary four node balance. And then the last thing we do is flip the colors and

now that's the result of that insertion. It's a bunch of transformations but

they're all simple using our flip colors or left or right rotation. And [cough]

that one happened to be at the root. If that red link were, were way down in the

tree and there were another three node about it, we might have to do it again.

Again, exactly as what would happen in a 2-3 tree. So, let's do a demo of

constructing the red-black BST from our standard set of keys. So, we start with a

single key. Now, if we want to insert E, if it goes to the left, that's fine.

That's a legal left-leaning red-black tree. A would go to the left of E two

lefts in a row so we have to rotate that to right. And then we have to flip the

colors. And that's a legal red-black BST. So now, if we insert R into this one then

it goes on a red link to the left of X, S and that's fine, it's a red-black BST. And

now, if we insert C into this one, it goes less than E, greater than A it's a red

link connecting A and C but it's leaning the wrong way. So, we have to do a left

rotation, legal-red black BST. And you want to insert h that goes to the left of

R, two reds in a row, rotate the top. Rotate the top, our temporary four node is

balanced, flip colors. Now, we have a three node, but the red link is leaning

right so we have to rotate. And now, we have a legal red-block BST. Insert x into

that one that goes to the right of S, it's leaning the wrong way, rotate left. Insert

M into this one, goes to the right of h, leaning the wrong way, rotate left. Most

of the operations are simple ones like this happening at the bottom. Insert P,

that goes to the right of M that makes M a temporary four node that happens to be

balanced, so flip the colors. Flip the colors, now we h ave a temporary four node

that's out of balance so we need a double rotation. First rotate E to make that link

point lean to the left, then rotate R to make the, bring the temporary four node

into balance. And then, flip the colors and that's a legal red-black BST. Insert L

into that one. It goes to the right of H, leading the wrong way rotate left. [cough]

And that's an example of building a red-black BST from our standard set of

keys. Now, we're ready to look at the implementation for, of the code for

inserting into a left-leaning red-black tree. And the key to understanding this

code is to realize that the same code, code handles all of the cases. And the way

that it works is we are always reducing one case to another. We get this most

complicated case we did a left rotate on the bottom node and that, that transformed

it to this case where they're both leaning left. And then we did a right rotate on

the top node, and that transformed to the case where our temporary four node is

balanced. And then we flipped colors on that. So, for a particular insertion, we

can take advantage of this reduce one case to another by, in, in the way that we're

moving in the tree, not to get everything happen with just a, a few extra lines of

code in our standard binary search tree. So, in gray is our, standard insertion

code for binary search trees. And remember we took some pains to think about the

recursive implementation where when we go down a link we replace that link by

whatever the recursive routine gives us back and that strategy is going to pay off

in giving us a really simple code. [cough] and because in this implementation for

left-leaning red-black trees we're going to return the link whenever we're done,

and then that will get that link installed up in the node above whether it be left or

right. Typical implementations of red-black trees that do not use this

recursive strategy wind u p having lots of cases depending on whether left or right

or double rotate to the left or double rotate to the right can be critical of

this code because my own was this way for the first three editions of the book. And

it's only in this edition that we figured out how to make the code this simple.

Okay. So what are the things left to be done? Let's just check. When we insert a

new node all we want to do is create a new node with the, I've given, associating the

given value with a given key, as before but now we just make that node red. So,

that's adding a new node with a red link at the bottom inserting that into whatever

the two or three node it's attached to. And then we do the comparisons as, as

before and that, and that's all fine. Now, when it's returned then that's the point

at which we're going to check whether the left, the links are leaning to the left as

they are suppose to and whether or not there are any double links or not. So the

first thing is if, if is red h.right and not is red h.left? So, that means H is

h.right is red so that means the right link of H is leaning the wrong way. So, H

is a three node leaning the wrong way. So we just rotate left H. So, whenever we

find a right link, we're sitting on a right red link we just rotate it left and

return that. So, that would be in this case here, we'd rotate it left and reduce

it to that one. [cough] or in, in you know, in the case when we're just

inserting a new node and it's turns out to be the right red link attached to a black

one, if that handles that case. Now, if h.left is red and h.left is also red

that's this case here where we have two left-leaning red links. And then in that

case, we just rotate the top one right and that brings us to this one. So, notice,

we're in this case we do this rotation first, we're on this node and then , that

returns and we come up to deal with the situation on this node after the return,

and then we do that rotation. And then after that rotation, or if there were no

rotations at all, if the insertion happened over here then we'd test and flip

the colors. It's a little mind bending at first because of the recursive structure

but it won't take you long to convince yourself that this little bit of extra

code completes the implementation of left-leaning red-black trees. It's quite

remarkable, actually. So, let's look at a visualization. Watching the [unknown],

this is a balanced tree getting constructed in the worst case where

everything that comes in is in ascending order. A regular binary search tree will

just be all strung out in a single line and wouldn't have quadratic time for this

input but a left-leaning red-black tree actually when, whenever it becomes a power

of two is completely balanced as you can see from this example. Even though they

came in, in ascending order, the tree winds up being perfectly balanced. And

what about descending order. Well, it's left leaning and the process is a little

bit different and sometimes the left path can get long but not that long. The worse

that can happen is that it alternates red and black. And then after it gets to that

worse case it also winds up being completely balanced when we have a power

of two. Interesting to think just, just about this case and to prove to yourself

that it's always going to be perfectly balanced when it's descending. And this is

just for random insertions. The tree stays very balanced. It's guaranteed that the

longest path which is alternating red and black can be no more than twice as long as

the shortest path which is all blacks. And so in this case the longest path is ten

and the average path is seven for 255. Very close to log based two of N. So easy

to prove by correspondence with 2-3 trees that t he height is guaranteed to be less

than two log base two N. Every search in left-leaning red black three is guaranteed

to take less than two log base two of N cuz every path gets the same number of

black links so you never have two red links in a row. [cough] And actually, in

typical applications with any kind of randomness or even if there is a lot of

order its difficult to find situations orders of keys that build the trace of

height is bigger than actually one log N in, in a real application, its very close

to fully balanced all the time. So, that completes our summary for a symbol table

implementations with red-black BSTs. We have full code it's the regular BST code

with the couple of lines adding the calls and the basic operations. She rotate

right, rotate left. In color flip, we could guarantee logarithmic performance

not just research, insert, in delete code. Delete code is a bit more complicated but

it's on the book side and in the book. But also, since it's the compare-to interface,

and since it's a binary tree representation all the other comparable

operations extended operations for ordered symbol tables are going to be implemented

and take time proportional to the log N. A lot of people ask why use the name

red-black. Well we invented this data structure this way of looking at balance

trees at, at Xerox PARC which was the home of the personal computer and many other

innovations that we live with today entering graphic user interface and

internet and object oriented programmings and many other things. But one of the

things that was invented there, was the laser printing and we were very excited to

have nearby color laser printer that could print things out in color and out of the

colors, the red looked the best. So, that's why we picked the color red to

distinguish red links the types of links in three nodes. So, that's an answer to

the question for people t hat have been asking. Now, here's another war story

about red-black BSTs. As I mentioned they're widely used. And there was an

example not that long ago, where a telephone company contracted with a

database provider to build a database that could store customer information and the

provider implemented the database using red-black BSTs for search and insert. Now,

our, our original paper on red black trees was the way the paper was laid out, it

turned out that the delete implementation happened to be placed after all the

references. So, a lot of people didn't see the delete implementation. And also we

didn't have the simple left-leaning representation. So it was more complicated

and involved a lot more cases and so usually not all the cases were put in the

text books. So, people found deletion more difficult. In fact, that's what lead to

[unknown] analyze the situation then come up with a left-leaning variant. So, what

they did in this implementation was they just put in regular Hibbard deletion in

the binary search in the red-black BST. Not the deletion algorithm that's

guaranteed to keep the constant black height all the time. And so but, but they

still thought that it should be balanced and it shouldn't matter much. And they had

a complex error recovery process that, that got triggered if the height limit got

too big. And they rebuild the whole tree and, and then because of the way they did

this deletion, well, the end of the story was that they had extended the client had

extended outage because the implementer didn't use the full algorithm. And there

was a lawsuit and some legal testimony and I am happy to report that, that it was

clear that Hibbard deletion was the problem once the expert analyzed it and

the expert witness, who's a colleague of mine, said if implemented properly, the

height of a red-black BST with N keys is at most two log N. And so that's the st

ory of red-black BST's guaranteed logarithmic performance for all symbol

table operations.