0:01

Now we'll look at the problem that's related to sorting called selection that's

also well solved by Quicksort partitioning. This is a simpler problem.

We're given an array of n items that are ordered and our task is to find the k-th

largest. There's lots of important applications for this. So like if we

wanted to find the minimum item that's k = zero or the maximum item that's k = n -

one or the medium that's k = n/2. And there's many kinds of applications from

people processing data. I wanted to find the top k or the medium or other order

statistics so that's what selection is all about. Now, here's an example where we

want to use theory as a guide. What kind of efficiency might we expect in a

selection algorithm. Well, first of all, it's easy to see that we can solve

selection and in law at end time. How would we do that? Well, we just sort the

array and then if we want to find the smallest, we'll look at the first position

or the largest, we'll look in the last position or the medium, we'll look in the

middle. In fact, if k is small, the running time is going to be proportional

to n. Because if you're looking for the smallest, you can just go through the

array and find the small or the smallest in one pass through or if you're two,

you'll find it and two passes through. So, you can imagine trying to look for a

selection algorithm that takes time proportional to n and also the lower bound

is n because you have to look at everything. If you don't look at

everything, you might miss the one item that you're looking for. So, from these

observations it's clear that what we, what we'd like is a selection algorithm that

takes linear time. But at this point, the question is, is there a linear time

algorithm that works for every k? Or possibly selection is as hard as sorting.

This kind of question plagued a lot of people in this late 60's or early 70's as

these types of problems emerge for computing applications. So, it's an

interesting question to think about for sure. Well in his original paper in 1961

Hoare gave a solution to the selection problem based on partitioning. And the

idea is just a version of Quicksort in a way. We're going to do our partitioning so

that we get entry a(j) in place of the array. Nobody to the left is larger,

nobody to the right is bigger. But then, when we're doing selection, what we'll do

is just go in one sub array or the other depending on where j is. If j = k, we're

done, we've found the k is the largest. If k is to the left of j, then, we just do

the left sub-file which is set high to j - one. And if k is to the right of j, we

just do the right subfiles that load the j + one and that's all this code does is

that it, we could do a recursive, a recursive call but this just does it by

resetting the values of the parameters. Do one partition then check whether you to

your k-th element is going to be on the left part or the right part and reset

lower high accordingly. If it's equal, then you found it and you return it and

you keep going until you get to a point where you have only one element. That's

the a Quicksort like implementation solving the selection problem. Notice

again that it depends on the random shuffle at the beginning that's going to

be important for performance. Alright. So there needs to be a mathematical analysis

to, to characterize the running time of this program in the fact is that quick

select this method takes linear time on the average. We won't give the full proof.

It's actually quite a bit more complicated than the one just on for Quick sort. But

intuitively, we can see kind of what happens each partitionings that maybe

splits the array approximately in half. So that, that means you'd have, if you did

exactly and [inaudible] + n/2 + n/4 and so forth which adds up to about two N compare

so linear cross. If you do the, actually it doesn't cut it in half at exactly each

time only on average so you need a fuller analysis like the one we did for Quicksort

and the bottom line of that analysis gives the number of comparisons required as a

function of n and of k in terms of this formula here and if you plug in k = n/2,

you get the result that the number of compares required to fine the median

that's the highest value this formula can take is two + two natural log of two. So,

linear time to find the k-th largest for any value of k. Now again it's going to

use, this is a method that's linear time on the average. It's actually going to be

quadratic in the worst case but again, the chance of that it will happen with a

random shuffle is less than the chance that we'll be struck by lightning. Its a

probabilistic guaranteed fast algorithm. Now, from a theoretical standpoint that's

a little unsatisfied and in, in 1973, there's a famous paper that found a

compared base selection algorithm that guarantees to solve the problem in linear

time. This is areal landmark in the theory of algorithms because for a long time,

it's not known, we knew we could have the average case, the linear time but could we

find a worst case? And this paper found such a construction. Now in practice, this

construction is, is rather high. So, the method is not really used in practice. And

so, there is still the goal of a, of a fast guaranteed linear time selection

algorithm maybe somebody in this class will invent someday. This is another

example where we use theory as a guide. It's still worth while to look for a

practical linear time worst case algorithm. Well then, maybe somebody in

this class will invent that but until something like that is discovered use the

quick select based on Quicksort partitioning you can get linear time

selection when you don't need a full sort. That selection of simple problem like

sorting that is well sound with Quicksort partitioning.