Now we're going to take a look at what happens when we have significant numbers

of duplicate keys which is not at all unusual in practical applications. So, in,

in fact, often, the purpose of a sort is to bring items with equal keys together

for like the example that I gave where we had cities and time. There's a lot of

detailed data and the time and maybe the whole goal of the sort is to group them by

cities so we can ship out the data for each city, to each city and there's plenty

of other examples like that in data processing where we find maybe remove

duplicates from a mailing list or all the job applicants that we get, we might want

to sort them by the college attendant. So the sort does huge files with huge numbers

of duplicate keys. So, a sort, it's worthwhile to take a careful look at what

the implication of that is. So again, typical characteristics we have a huge

file but small number of different key values. So let's look at our algorithms in

that situation. So Mergesort, it doesn't matter that much what the key values are

like and it's actually, we can show that Mergesort always uses between one-half, N

log N and N log N compares. Quicksort actually, they're up until the 1990s the

most widely used implementation took quadratic time. For files with large

numbers of equal keys and that was actually found by applications user and,

and that's the standard Quicksort that was in all the textbooks almost all the

textbooks if you did not stop the partitioning on equal keys it would run in

quadratic time. So we want to do better than this. So the mistake happens if we

put all the items equal to the partitioning item on one side which is a

natural way to implement it and the consequence is if you have all the keys

equal, then partitioning doesn't really do anything. You just peels off one key to do

file size n then you get a sub file size n - one and then n - two and so forth and

the result is a quadratic tim e algorithm. Our implementation, we stopped the

partitioning scans on items equal to the partitioning item and then in that case,

when all the keys are equal, it's going to divide it exactly in the middle. And then

in that case, when all the keys are equal, it's going to divide at exactly in the

middle. And then in that case you get an N log N compares. But actually when you

think about it, why don't we just put all the items equal to the partitioning item

in place. That's, that's really a desirable way to look at it and let's take

a look at that option. So the goal is so called three way partitioning. So what we

want to do is get the array into three parts so then now we have two pointers

into the middle. One that is the boundary between the keys that are less than the

partitioning element and those that are equal of the partitioning element. Another

one that's the boundary between the keys that are equal of partitioning elements

and the one that is greater. And then in the middle are all the equal keys and

that's what we'd like to arrange. Now until the 1990s, conventional wisdom among

people implementing system was, wasn't worth doing this. But, but actually it's a

problem that Edsger Dijkstra had proposed in the 70s as an example of, of

programming problem. He was really interested in analyzing correctness of

programs and showing that this how you could convince yourself that this program

was operating as expected. But in the 1990s we figured out that really this was

going to be an effective way to sort. And this was taking a look at the Qsort that a

user found was broken and, and now, this method is incorporated into some plenty of

system sorts. So let's take a look at how it works with the demo its more

complicated than standard Quicksort partitioning. A bit more complicated

because there's more to do. So now we have our i pointer which is right to the left

of stuff we haven't seen ye t and then, we have two other pointers that maintain,

maintain these boundaries everything to the right of gt is known to be greater

than partitioning element. Everything to the left of lt is known to be less and

between lt and i is known to be equal. And all the method does is maintain this in

variants so let's do an example or two and see how that works. So we increment i and

then figure out what to do. So, now it's less than the partitioning element. So,