In this video I'll explain the mathematical analysis of the randomized linear time selection algorithm that we studied in the previous video. Specifically, I'm going to prove to you the following guarantee for that algorithm. For every single input array of length n the running time of this randomized selection algorithm on average will be linear. Pretty amazing if you think about it because that's barely more what the time it takes just to read the input. And in particular this linear time algorithm is even faster than sorting. So this shows that selection is a fundamentally easier problem than sorting. You don't need to reduce to sorting. You can solve it directly in O(n) time. I want to reiterate the same points I made about quick sort. The guarantee is the same. It is a general purpose subroutine. We make no assumptions about data. This theorem holds no matter what the input array is. The expectation, the average that's in the theorem statement is only over the coin flips made by the algorithm made inside it's code of our own devising. Before we plunge into the analysis, let me just make sure you remember what the algorithm is. So it's like quick sort. We partition around a pivot except we only recurse once, not twice. So we're given an array with some length n. We're looking for the ith order statistic, the ith smallest element. The base case is obvious. You're not in the base case; you choose a pivot p, uniformly at random from the input array just like we did in quick sort. We partition around the pivot just like we did in pic, in quick sort. That splits the array into a first part of those elements less than the pivot and the second part of those elements which are bigger than the pivot. Now, we have a couple of cases. The case which is very unlikely so we don't really worry about, if we're lucky enough to guess the pivot as the ith order statistic what we're looking for. That's when the new position j. Of the pivot element happens to equal I. What we're looking for. Then, of course, we just return it. That was exactly what we wanted. In the general case, the pivot is going to be in the position J, which is either bigger than what we're looking for I, that's when the pivot is too big or J. It's position will be less than the ith order statistic we're looking for. That's when the pivot is too small. So if the pivot's too big, if J is bigger than i that when we're looking for is on the left hand side amongst the elements less than the pivot. So that's where we recurse. We've thrown out both the pivot and everything to the right of it. That leaves us with an array of J minus I elements and we're still looking for the ith smallest among these J minus1 smallest elements. And then the final case, this is what we went through in the quiz and last video, is if we choose a pivot who's smaller than what we're looking for, that's when J is less than I, then it means we're safe to throw out the pivot and everything less than it. We're safe recursing on the second part of those elements bigger than the pivot. Having thrown out the J's smallest elements, we're recursing on an element of length of N-J and we're looking for the i-j smallest element in those that remain, having already thrown out the J smallest from the input array. So that's randomized selection. Let's discuss why it's linear time on average. The first thought that you might have, and this would be a good thought, would be that we should proceed exactly the same way that we did in Quick Sort. You recall that when we analyzed Quick Sort, we set up these indicator random variables, x, i ,j determining whether or not a given, pair of elements got compared at any point in the algorithm. And then we just realized the sum of the comparisons is just the sum, overall, of these x,i, js. We applied linearity of expectation and it boiled down to just figuring out the probability that a given pair of elements gets compared. You can analyze this randomized selection algorithm in exactly the same way. And it does give you a linear time bound on average. But it's a little messy. It winds up being not quite as clean as in the quick sort analysis. Moreover, because of the special structure of the selection problem, we can proceed in an even more slick way here than the way we did with quick sort. So, again we'll have some constituent random variables. We'll again apply linearity of expectation but the definition of those random variables is going to be a little bit different than it was in quick sort. So, first a preliminary observation. Which is that the workhorse for this randomized selection procedure is exactly the same as it was in quick sort. Namely it's the partition subroutine. Essentially all of the work that gets done outside of the recursive call just partitions the input array around some pivot element as we discussed in detail in a separate video that takes linear time. So usually when we say something's linear time we just use big O notation. I'm gonna go ahead and explicitly use a constant c here for the operations outside the recursive call. That'll make it clear that I'm not hiding anything up my sleeves when we do the rest of the analysis. Now what I wanna do on this slide is introduce some vocabulary, some notation which will allow us to cleanly track the progress of this recursive selection algorithm. And by progress I mean. The length of the array on which is currently operating. Remember we're hoping for a big win over quick sort, cuz here we only do one recursive call, not two. We don't have to recurse on both sides of the pivot just on one of them. So it stands to reason, that we can think about the argument making more and more progress as a single recursive calls operating on arrays of smaller and smaller length. So the notion that will be important for this proof is that of a phase. This quantifies how much progress we've made so far, with higher numbered phases corresponding to more and more progress. We'll say that the r select algorithm at some midpoint of its execution is in the middle of phase J. If the array size that the current recursive call is working on has length between 3/4th raised to the J times N and the smaller number 3/4th J+1 times N. For example think about the case where J equals zero. That says phase zero recursive calls, operate on arrays with size of N and 75 percent of N. So, certainly, the outermost recursive call is going to be in phase zero. Because the input array has size N. And then, depending on the choice of the pivot, you may or may not get out of phase zero in the next recursive call. If you choose a good pivot, and you wind up recursing on something, that has, at most, 75 percent of the original elements, you will no longer be in phase zero. If you recurse on something that has more than 75 percent of what you started with, of the. Input array, then you're still gonna be in phase zero even in the second recursive call. So overall the phase number J, quantifies the number of times we've made 75 percent progress, relative to the original input array. And the other piece of notation that's going to be important is what I'm going to call Xj. So for a phase J, Xj simply counts the number of recursive calls in which a randomized selection algorithm is in phase J. So this is gonna be some integer. It could be as small as zero, if you think about it, for some of the phases. Or it could be larger. So why am I doing this? Why am I making these definitions of phases and of these XJs? What's the point? We're gonna remember the point was we wanna be able to cleanly talk about the progress that the randomized selection algorithm makes through its recursion, and what I wanna now show you is that in terms of these XJs, counting the number of iterations in each phase, we can derive a relatively simple upper bound on the number of operations that our algorithm requires. Specifically the running time of our algorithm, can be bounded above by the running time in a given phase, and then summing those quantities over all of the possible phases. So we're gonna start with a big sum, over all the phases J. We want to look at the number of recursive calls that we have to endure in phase J, so that's XJ by definition. And then we look at the work that we do outside of the recursive calls in each recursive call during phase J. Now, in a given recursive call, outside of its recursive call, we do C times M operations where M is the length of the input array and during phase J we have an upper bound on the link of the input array. By definition it's at most three quarters raised to the J times N. So that is, we multiply the running time times this constant C this, we inherit from the partition subroutine and then we can, for the input length, we can put an upper bound of three quarters raised to the J times N. So just to review where all of these terms come from, there's three quarters J times N is an upper bound on the array size. During phase J, this by the definition of the phase. Then, if we multiply that times c, that's the amount of work that we do on each phase J sub-problem. How much work do we do in phase J overall or we just take the work per sub problem that's what's circled in yellow and we multiply it times the number of such sub problems we have. And, of course, we don't wanna forget any of our sub problems so we just make sure we sum all of our phases, j, to insure that at every point we count the work done in each of the sub problems. Okay? So, that's the upshot of this slide. We can upper bound the running time of our randomized algorithm very simply in terms of phases and the XJ's, the number of sub problems that we have to endure during phase j. So, this upper bound on our running time is important enough to give it notation, we'll call this star, this will be the starting point of our final derivation when we complete the proof of this theorem. Now don't forget, we're analyzing a randomized algorithm so therefore the left hand side of this inequality the running time of r select, that's a random variable. So that's a different number depending on the outcome of the random coin flips of the algorithm. Depending on the random pick it has chosen, you will get different random running times. Similarly the right hand side of this inequality. Is also a random variable. That's because the X J's are random variables. The number of sub problems in phase j depends on which pivots get chosen. So. To analyze, what we care about is the expectations of these quantities, their average values. So we're gonna start modestly and as usual, this will extend our modest accomplishments to much more impressive ones using linearity of expectation, but our first modest goal is just to, to understand the average value. Of an XJ, the expected value of XJ. We're gonna do that in two steps. On the next slide, I'm going to argue that to analyze the expectation of XJ, it's sufficient to understand the expectation of a very simple coin flipping experiment. Then, we'll analyze that coin flipping experiment. Then we'll have the dominos all set up in a row. And on the final slide, we'll knock'em down and finish the proof. So let's try to understand the average number of recursive calls we expect to see in a given phase. So, again, just so we don't forget. Xj is defined as the number of recursive calls during phase J. Where a recursive call is in phase J, if and only if the current sub array length lies between three-fourths raised to the J+1 times N. And then, the larger number of three-fourths raised to the J times N. So again, for example, phase zero is just the recursive calls under which the array length is between 75 percent of the original element and 100 percent of the original elements. So what I wanna do next is point out that a very simple sufficient condition guarantees that we'll proceed from a given phase onto the next phase. So it's a condition guaranteeing termination of the current phase. And it's an event that we've discussed in previous videos. Mainly that the pivot that we choose gives a reasonably balanced split. 25-75 or better. So recall how partitioning works, we choose a pivot P. It winds up wherever it winds up. And the stuff to the left of it's less than P. The stuff to the right of it is bigger than P. So 25 to 75 split or better, what I mean is that each of these, each, the first part and the second part has, at most, 75 percent of the elements in the input array. Both have twen-, both have at least 25%, and, at most, 65%. And the key point is, that if we wind up choosing a pivot that gives us a split that's at least as good the current phase must end. Why must the current phase end? Well, to get a 25, 75 split or better than no matter which case we wind up in, in the algorithm we're guaranteed to recurse on a sub problem that has at most 75 percent of what we started with. That guarantees that whatever phase we're in now, we're going to be in an even bigger phase when we recursed. Now, I want you to remember something that we talked about before, which is that you've got a decent chance when you pick a random pivot of getting something that gives you a 25, 75 split or better. In fact, the probability is 50 percent. Right? If you have an array that has the integers from one to 100 inclusive, anything from 76 to s, 26 to 75 will do the trick. That'll insure that at least the first 25 elements are excluded from the rightmost call and at least rightmost 25 elements are excluded from the left recursive call. So this is why we can reduce our analysis of the number of recursive calls during a given phase, to a simple experiment involving flipping coins. Specifically, the expected number of recursive calls. Now we are gonna see in a given phase J, is no more than the expected number of coin flips in the following experiment. Okay, so you've got a fair coin, 50 percent heads, 50 percent tails. You commit to flipping it until you see the head and the question is, how many coin flips does it take up to and including the first head that you see? So the minimum it's gonna be one coin flip if you hit a head the first time it's one. If you get a tails and then a head, then it's two. If it's tails, tails, head it's three and so on, and you always stop when you hit that first head. So what's the correspondence? Well, think of heads as being you're in Phase J, and if you get a good pivot, it gives you a 25/75 split. Call that heads. And it guarantees that you exit this Phase J. Just like it guarantees that you get to terminate the coin flipping experience, experiment. Now, if you get a pivot which doesn't give you a 25/75 split, you may or may not pass to a higher Phase J, but in the worst case, you don't. You stick to phase J is you get a bad split, and that's like getting a tails in the coin flipping experiment, and you have to try again. This correspondence give us a very elementary way to think about the progress that, that our randomized selection algorithm is making. So, there's one recursive call in every step in our algorithm, and each time we either choose a good pivot or a bad pivot, both could happen, 50-50 probability. A good pivot means we get a 75-25 split or better. A bad pivot means, by definition, we get a split worse than 25-75. So what have we accomplished? We've reduced the task of upper bounding the expected number of recursive calls in a phase J to understanding the expected number of times you have to flip a fair coin before you get one hit. So on the next slide we'll give you the classical and precise answer to this coin flipping experiment. So, let me use capital N to denote the random variable, which we were just talking about, the number of coin flips you need to do before you see the first heads. And, it's not very important, but you should know that these random variables have their own name. This would be a geometric random variable with parameter one-half. So you can use a few different methods to compute the expected value of a geometric random variable such as this, and brute force using the definition of expectation works fine as long as you know how to manipulate infinite sums. But for the sake of variety, let me give you a very sneaky proof of what it's expectation is. So the sneaky approach is to write to the expected value of this random variable in terms of itself and then solve for the unknown, solve for the expectation. So let's think about it. So how many coins flips do you need? Well for sure you're gonna need one. That's the best case scenario. And now two things can happen, either you get heads and that has 50 percent probability you stop or you get tails that happens with 50 percent probability and now you start all over again. Again you just put points until you get first heads. On average how many times does that take. Well by the definition of capital N you expect. The expectation of N coin flips, in the case where you get tails, and you have to start all over. So this one represents the first coin flip, the one-half is the probability that you can't stop, that you have to start all over probability of tails, and then because it's a memory less process, because when you start anew on the second coin flip having gotten the tails, it's as if you're back at time one all over again. So now we have a trivial equation, in terms of the unknown expected value of N and the unique solution, the unique value, that the expected value of capital N could have, in light of this equation, is two. So, on average if you flip a fair coin and stop when you get heads, you're going to see two coin flips on average. To make sure you haven't sort of lost the forest for the trees, let me remind you why we were talking about this coin flipping analysis in the first place. So recall in the previous slide we showed that XJ, and remember XJ is the number of recursive calls you'd expect to see in a given phase J, and we argued that the number of recursive calls you're gonna see is bounded above. By the expected number of coin flips until the heads. So this exact calculation of two for the coin flips gives us an upper bound of two for the number of recursive calls on average in any given phase J. So now that we've got all our ducks lined up in a row, let's wrap up the proof on this final slide. So, inherited from part one of the proof, we have an upper bound. On the expected running time. Of the R select algorithm. This is what we were calling star on the first brief slide In star, it looked a little messy, but we had the sum over the phases J. But we had two things that were independent of j: the constant c and the original input length n, so let me just yank the c and the n out front. And then we have this residual sum over the phases J. Of three quarters raised to the J remember that comes from our upper bound on the sub problem size during phase J and then of course we have to keep track of how many phase J sub problems we have solved that by definition is XJ. Now star was written as a rand in accordance terms to the random variables. Now we're gonna go ahead and take the expectations and again I have said this over and over but don't forget where's the expectation come from. This is over the random pivot choices that our code makes. So the expected running time of the algorithm is most the expectation of this start quantity. So like I said earlier, pretty much every time we're gonna do any analysis of [inaudible] process, we're gonna wind up using linearity of expectation at some point. Here is where we do it. Linear expectation says the expectation of a sum is just the sum of the expectations. So we yank the c and the n outside of the expectation. We yank this sum over phases. Outside of the expectation. We yank this three-fourths raised to the J outside of the expectation and then we just have the expected value of XJ, the average number of recursive calls we expect to see in HJ. Now on the previous two slides, we figured out an upper bound on how many recursive calls we expect to see in each phase. So first by the coin flip analysis, by the reduction of the coin flip analysis, this is the most expected number of coin flips N, which on the previous slide, we argued was exactly two. So bringing that two out in front of the sum, that no longer depends on J. So we get a most 2CN. Times the sum over phases J, of three quarters raised to the J. Now this kind of sum we have seen previously in the course. It came up when we were analyzing the master method and we summed up our running time upper bounds over the levels of our recursin tree. And if we're not in case one if we're in case two or three we had geometric sums that were nontrivial. They require a certain formula to calculate, so let me remind you of that formula here, when the three quarters are being powered up to the J. So this has value at most, one over one minus, the number that's getting powered, so in this case it's three quarters, one minus three quarters is a quarter check reciprocal, you got four. And the upshot is that the expected number of operations that this randomized selection algorithm uses to find the [inaudible] ordered statistic in a given input array, is eight times C times N. Where C is the, hidden constant in the linear running time of partition. And so that completes the proof. The input array was arbitrary. We showed the expected running time over the random choices of the algorithm is linear in N. That is, only a constant factor larger than what is required to read the input. Pretty amazing.