Now we'll look at the problem that's related to sorting called selection that's also well solved by Quicksort partitioning. This is a simpler problem. We're given an array of n items that are ordered and our task is to find the k-th largest. There's lots of important applications for this. So like if we wanted to find the minimum item that's k = zero or the maximum item that's k = n - one or the medium that's k = n/2. And there's many kinds of applications from people processing data. I wanted to find the top k or the medium or other order statistics so that's what selection is all about. Now, here's an example where we want to use theory as a guide. What kind of efficiency might we expect in a selection algorithm. Well, first of all, it's easy to see that we can solve selection and in law at end time. How would we do that? Well, we just sort the array and then if we want to find the smallest, we'll look at the first position or the largest, we'll look in the last position or the medium, we'll look in the middle. In fact, if k is small, the running time is going to be proportional to n. Because if you're looking for the smallest, you can just go through the array and find the small or the smallest in one pass through or if you're two, you'll find it and two passes through. So, you can imagine trying to look for a selection algorithm that takes time proportional to n and also the lower bound is n because you have to look at everything. If you don't look at everything, you might miss the one item that you're looking for. So, from these observations it's clear that what we, what we'd like is a selection algorithm that takes linear time. But at this point, the question is, is there a linear time algorithm that works for every k? Or possibly selection is as hard as sorting. This kind of question plagued a lot of people in this late 60's or early 70's as these types of problems emerge for computing applications. So, it's an interesting question to think about for sure. Well in his original paper in 1961 Hoare gave a solution to the selection problem based on partitioning. And the idea is just a version of Quicksort in a way. We're going to do our partitioning so that we get entry a(j) in place of the array. Nobody to the left is larger, nobody to the right is bigger. But then, when we're doing selection, what we'll do is just go in one sub array or the other depending on where j is. If j = k, we're done, we've found the k is the largest. If k is to the left of j, then, we just do the left sub-file which is set high to j - one. And if k is to the right of j, we just do the right subfiles that load the j + one and that's all this code does is that it, we could do a recursive, a recursive call but this just does it by resetting the values of the parameters. Do one partition then check whether you to your k-th element is going to be on the left part or the right part and reset lower high accordingly. If it's equal, then you found it and you return it and you keep going until you get to a point where you have only one element. That's the a Quicksort like implementation solving the selection problem. Notice again that it depends on the random shuffle at the beginning that's going to be important for performance. Alright. So there needs to be a mathematical analysis to, to characterize the running time of this program in the fact is that quick select this method takes linear time on the average. We won't give the full proof. It's actually quite a bit more complicated than the one just on for Quick sort. But intuitively, we can see kind of what happens each partitionings that maybe splits the array approximately in half. So that, that means you'd have, if you did exactly and [inaudible] + n/2 + n/4 and so forth which adds up to about two N compare so linear cross. If you do the, actually it doesn't cut it in half at exactly each time only on average so you need a fuller analysis like the one we did for Quicksort and the bottom line of that analysis gives the number of comparisons required as a function of n and of k in terms of this formula here and if you plug in k = n/2, you get the result that the number of compares required to fine the median that's the highest value this formula can take is two + two natural log of two. So, linear time to find the k-th largest for any value of k. Now again it's going to use, this is a method that's linear time on the average. It's actually going to be quadratic in the worst case but again, the chance of that it will happen with a random shuffle is less than the chance that we'll be struck by lightning. Its a probabilistic guaranteed fast algorithm. Now, from a theoretical standpoint that's a little unsatisfied and in, in 1973, there's a famous paper that found a compared base selection algorithm that guarantees to solve the problem in linear time. This is areal landmark in the theory of algorithms because for a long time, it's not known, we knew we could have the average case, the linear time but could we find a worst case? And this paper found such a construction. Now in practice, this construction is, is rather high. So, the method is not really used in practice. And so, there is still the goal of a, of a fast guaranteed linear time selection algorithm maybe somebody in this class will invent someday. This is another example where we use theory as a guide. It's still worth while to look for a practical linear time worst case algorithm. Well then, maybe somebody in this class will invent that but until something like that is discovered use the quick select based on Quicksort partitioning you can get linear time selection when you don't need a full sort. That selection of simple problem like sorting that is well sound with Quicksort partitioning.