0:00

Previous videos covered an outstanding algorithm for the selection problem, the

problem of computing the Ith statistic of a given array. That algorithm which we

called the R select algorithm was excellent in two senses. First its super

practical, runs blazingly fast in practice. But also it enjoys a satisfying

theoretical guarantee. For every input array of length N at the expected running

time of R select is big O of N, where the expectation is over the random choices of

the pivots that R select makes during execution, now in this optional video

I'm going to teach you yet another algorithm for the selection

problem. Well why bother given that our select is so good? Well frankly, I just

can't help myself. The ideas of this algorithm are just too cool not to tell

you about, at least in optional video like this one. The selection algorithm

, we cover here is deterministic. That is, it uses no randomization

whatsoever. And it's still gonna run in linear time, big O of N time. But now, in

the worst case for every single input array. Thus, the same way Merge Short gets

the same asymptotic running time, big O of N log N, as quick sorts gets on

average. This deterministic algorithm will get the same running time O of N, as the R

select algorithm does on average. That said, the algorithm we're gonna cover

here, well, it's not slow. It's not as fast as R select in practice, both because

the hidden constants in it are larger. And also because it doesn't' operate in place.

For those of you who are feeling keen, you might wanna try coding up both the

randomized and the deterministic selection algorithms, and make your own measurements

about how much better the randomized one seems to be. But if you have an

appreciation for Boolean algorithms, I think you'll enjoy these lectures

nonetheless. So let me remind of the problem. This is the i-th order

statistic problem. So we're given an array, it has N distinct entries. Again,

the distinctness is for simplicity. And you're given a number I between one and N.

You're responsible for finding the i-th smallest number, which we call

the i-th order statistic. For example, if I is something like N over

two, then we're looking for the median. So let's briefly review the randomized

selection algorithm. We can think of the deterministic algorithm covered here as a

modification of the randomized algorithm, the R select algorithm. So when that

algorithm is passed in array with length N, and when you're looking for the

i-th order statistic, as usual, there's a trivial base case. But when

you're not in the base case, just like in Quick Sort, what you do is you're gonna

partition the array around pivot element P. Now, how are you gonna choose P? Well,

just like Quick Sort, in the randomized algorithm, you choose it uniformly at

random. So each of the N elements of the input array are equally likely to be

chosen. As the pivot. So, call that pivot p. Now, do the partitioning. Remember

partitioning puts all of the elements less than the pivot to the left of the pivot.

We call that the first part of the partitioned array. Anything big, bigger

than the pivot gets moved to the right of the pivot. We call that the second part of

the array. And let j denote the position of the pivot in this partitioned array.

Equivalently, let j be what order statistic that the pivot winds up

happening to be. Right? So, we happen to choose the minimum element then J's gonna

be equal to one. If we happen to choose the maximum element, J's gonna be equal to

n. And so on. So, there's always the lucky case, chance one in N, that we happen to

choose the Ith order statistic as our pivot. So, we're going to find that out

when we notice that J equals I. In that super lucky case, we just return the pivot

and we're done. That's what we're looking for in the first place. Of course, that's

so rare it's not worth worrying about, so really the two main cases depend on

whether the pivot that we randomly choose is bigger than what we are looking for or

if it's less than what we are looking for. So, if it's bigger than what we are

looking for, that means J is bigger than I, we're looking for the Ith smallest, we

randomly chose the J'th smallest. Then remember we know that the Ith smallest

element has to lie to the left of the pivot. Good element in that first part of

the partition array. So we recurs there. It's an array that has J-1 elements in it,

everything less than the pivot. And we're still looking for the Ith smallest among

them. In the other case, this was the case covered in a quiz a couple videos back, if

we guess a pivot element that is less than what we're looking for, well then we

should discard everything less than the pivot and the pivot itself. So we should

recurs on the second part of A, stuff bigger than the pivot. We know that's

where what we're looking for lies. And having thrown away J elements, the

smallest ones at that. We're rehearsing on a ray of [inaudible] and minus J, I'm

looking for the [inaudible] smallest element in that second part. So, that was

the randomized selection algorithm, and you'll recall the intuition for why this

works is random pivot should usually give pretty good splits. So the way the

analysis went is we should. Each iteration, each recursive call, with 50

percent probability, we get a 25/75 split or better. Therefore, on average, every

two recursive calls, we are pretty aggressively shrinking the size of the

recursive call. And for that reason, we should get, something like a linear time

bound. We do almost as well as if we picked the median in every single call,

just because random pivots are a good enough proxy of best case pivots, of. The

median. So now the big question is: suppose we weren't permitted to make use

of randomizations. Suppose this choose-a-random-pivot trick was not in our

tool box. What could we do? How are we going to deterministically choose a good

pivot? Let's just remember quickly what it means to be a good pivot. A good pivot is

one that gives us a balanced split, after we do the partitioning of the array. That

is, we want as close to a 50/50 split between the first and the second parts of

the partitioned array as possible. Now, what pivot would give us the perfect 50/50

split? Well, in fact, that would be the median. Well, that seems like a totally

ridiculous observation, because we canonically, are trying to find the

median. So previously we were able to be lazy, and we just picked a random pivot,

and used that as a pretty good proxy for the best case pivot. But now, we have to

have some subroutine that deterministically finds us a pretty good

approximation of the median. And the big idea in this linear time selection

algorithm, is to use what's called the median of medians as a proxy for the true

meaning of the input array. So when I say median of medians, you're not supposed to

know what I'm talking about. You're just supposed to be intrigued. Now, let me

explain a little bit further. Here's the plan, we're gonna have our new

implementation of chose pivot and it's gonna be deterministic. You will see no

randomization on this slide, I promise. So the high-level strategy is gonna be we're

gonna think about the elements of this array like sports teams, and we're gonna

run a tournament, a 2-round. Knockout tournament, and the winner of this

tournament is going to be who we return as the proposed pivot element. Then we'll

have to prove that this is a pretty good pivot element. So there's gonna be two

rounds in this tournament. Each element, each team is gonna first participate in a

world group, if you will. So they'll be, small groups of five teams each, five

elements each. And to win your first round, you have to be the middle element

out of those five. So that'll give us N over five first round winners. And then

the winner of that second round is going to the med-, be the median of those N over

five winners from the first round. Here are the details. The first step isn't

really something you actually do in the program, it's just conceptually. So

logically, we're going to take this array capital A, which has N elements, and we're

gonna think of it as comprising N over five groups with five elements each. So if

N is not a multiple of five, obviously, there'll be one extra group that has size

between one and four. Now for each of these groups of five, we're going to

compute the median, so the middle element of those five. Now for five elements, we

may as well just invoke a reduction to sorting; we're just gonna sort each group

separately, and then use the middle element, which is the median. It doesn't

really how you do the sorting. Because after all, there's only five elements. But

you know, let's use [inaudible] sort, what the heck. Now what we're going to do is

we're going to take our first round winners and we're gonna copy them over

into their own private array. Now this next step is the one that's going to seem

dangerously like cheating, dangerously like I'm doing something circular and not

actually defining a proper algorithm, so c you'll notice has linked over n over five.

We started with an array of link n. This is a smaller input. So let's recursively

compute the median of this array capital c. That is the second round of our

tournament amongst the n over five first-round winners, the n over five

middle elements of the sorted groups. We recursively compute the median, that's our

final winner, and that's what we return as the pivot element from this subroutine.

Now I realize it's very hard to keep track of both what's happening internal to this

juice pivot subroutine and what's happening in the calling function of our

deterministic selection algorithm. So let me put them both together and show them to

you, cleaned up, on a single slide. So, here is the proposed deterministic,

selection algorithm. So, this algorithm uses no randomization. Previously, the

only randomization was in choosing the pivot. Now we have a deterministic

subroutine for choosing the pivot, and so there's no randomization at all. I've

taken the liberty of in-lining true's pivot subroutine. So that is exactly what

lines one, two, and three are. I haven't written down the base case just to save

space I'm sure you can remember it, so if you're not in the base case. What did we

do before? The first thing we do is choose a random pivot. What do we do now? Well,

we have steps one through three. We do something much more clever to choose a

pivot. And this is exactly what we said on the last slide. We break the array into

groups of five. We sort each group, for example, using merge sort. We copy over

the middle element of each of the n over five groups into their own array capital

c. And, then, we recursively compute the median of c. So, when we recurs on select

that we pass the input c. C has n over five elements so that's the new link.

That's a smaller link than what we start with so it's a legitimate recursive call

refining the median of n over five element. So, that's gonna be the n over

tenth order statistic. As usual. Well to keep things clear I'm ignoring stuff like

fractions, in your real implementation you'd just round it up or down. As

appropriate. So steps one through three are the new [inaudible] step routine that

replaces the randomized selection that we had before. Steps four through seven are

exactly the same as before. We've changed nothing. All we have done is ripped out

that one line where we chose the pivot randomly and pasted in these lines one

through three. That is the only change to the randomized selection algorithm. So,

the next quiz is a standard check that you understand this algorithm, at least, not

necessarily why it?s fast; but, at least, just how it actually works. And I only ask

you to identify how many recursive calls there are, each time. So, for example in

[inaudible] there's two recursive calls, in quick-sort there's two recursive calls,

in r-select there's one recursive call. How many recursive calls do you have each

time, outside of the base case in the d-select algorithm? All right, so the

correct answer is two. There are two recursive calls in deselect, and maybe the

easiest way to answer this question is not to think too hard about it and literally

just inspect the code and count, right namely there's one recursive call in line

three, and there's one recursive call in either six or seven, so quite literally,

you know there's seven lines of code, and two of the ones that get executed have a

recursive call so the answer is two. Now what's confusing is that in the random, a

couple things, first in the randomized selection algorithm, we only have one

recursive call. We have the recursive call. In line six or seven, we didn't have

this in line three. That one in line three is new compared to the randomized

procedure. So we're kind of used to thinking of one recursive call using the

divide and conquer approach to selection, here we have two. Moreover. Conceptually.

The roll of these two recursive calls are different. So the one in line six or seven

is the one we're used to. That's after you've done the partitioning so you have a

smaller sub-problem and then you just recursively find the residual or statistic

in the residual array. That's sort of the standard divide and conquer approach.

What's sort all crazy. Is this second use of a recursive call which is part of

identifying a good pivot element for this outer recursive call and this is so

counter-intuitive, many students in my experience don't even think that this

algorithm will hold, sort of, they sort of expect it to go into an infinite loop. But

again, that sort of over thinking it. So let's just compare this to an algorithm

like merge sort. What does merge sort do? Well it does two recursive calls and it

does some other stuff. Okay. It does linear work. That's what it does to merge.

And then there are two recursive calls on smaller sub problems, right? No issue. We

definitely feel confident that merge [inaudible] is gonna terminate because the

sub problems keep getting smaller. What does deselect do, if you squint? So don't

think about the details just [inaudible] high level. What is the work done in

deselect? Well. There are two recursive calls, there's [inaudible] one's in line

three, one's in line six or seven, but there's two recursive calls on sm, smaller

sub problem sizes. And there's some other stuff. There's some stuff in steps one and

two and four, but whatever. Those are recursive calls. It does some work. Two

recurs have caused the smaller sub-problems, TI's got to terminate. We

don't know what the run time is, but it's got to terminate, okay? So if you're

worried about this terminating, forget about the fact that the two recurs of

cause have different semantics and just remember, if ever, you only recurs on

smaller sub-problems, you're definitely going to terminate. Now, of course who

knows what the running time is? I owe you an argument on why it would be anything

reasonable, that's going to come later. In fact what I'm gonna prove to you is not

only does this selection algorithm terminate, run in finite time, it actually

runs in linear time. No matter what the input array is. So where as with R select,

we could only discuss its expected running time being linear. We showed that with

disastrously bad choices for pivots, R selects can actually take quadratic time.

Under no circumstances will deselect ever take, ever take quadratic time. So for

every input array it's big O of N time. There's no randomization because we don't

randomly do anything in choose pivot, so there's no need to talk about average

running time; just the worst case running time over all inputs is O of N. That said,

I want to reiterate the warning I gave you at the very beginning of this video which

is, if you actually need to implement a selection algorithm, you know, this one

wouldn't be a disaster. But it is not the method of choice, so I don't want you to

be misled. As I said there are two reasons for this. The first is that the constants

hidden in the begon notation are larger for V select than for R select. That will

be somewhat evident from the analyses that we give for the two algorithms. The second

reason is, recall we made a big deal about how partitioning works in place and

therefore quicksort and r select also work in place, that is, with no real additional

memory storage. But in this deselect algorithm we do need this extra array c to

copy over the middle elements, the first round winners. And so the extra memory, as

usual, slows down the practical performance. One final comment. So for

many of the algorithms that we cover, I hope I explain them clearly enough that

their elegance shines through and that for many of them you feel like you could have

up with it yourself, if only you'd been in the right place at the right time. I think

that's a great way to feel and a great way to appreciate some of these very cool

algorithms. That said, linear time selection, I don't blame you if it, if you

feel like you never might have come up with this algorithm. I think that's a

totally reasonable way to feel after you see this code. If it makes you feel

better, let me tell you about who came up with this algorithm. It's quite old at

this point, about 40 years, from 1973. And the authors, there are five of them and at

the time this was very unusual. So, Manuel Blum. Bob Floyd. Vaughn Pratt. Ron Rivest.