0:01

So let's review the story so far.

We've been discussing the QuickSort algorithm.

Here again is its high level description.

So in QuickSort you call two subroutines first, and

then you make two recursive calls.

So the first subroutine ChoosePivot, we haven't discussed yet at all.

That'll be one of the main topics of this video.

But the job of the ChoosePivot subroutine is to somehow select

one of the n elements in the input array, to act as a pivot element.

Now what does it mean to be a pivot?

Well that comes into play in the second subroutine,

the partition subroutine, which we did discuss quite a bit in a previous video.

So what a partition does is it rearranges the elements in the input array, so that

it has the following property, so that the pivot p winds up in its rightful position.

That is, it's to the right of all of the elements less than it, and

it's to the left of all of the elements bigger than it.

The stuff less than it's to the left in some jumbled order.

The stuff bigger than it's to the right in some jumbled order.

That's what's listed here as the first part and

the second part of the partitioned array.

Now, once you've done this partitioning, you're good to go.

You just recursively sort the first part to get them in the right order,

you call QuickSort again to recursively sort the right part, and

bingo, the entire array is sorted.

You don't need a combine step, you don't need a merge step.

Where we'll recall in a previous video,

we saw that the partition array can be implemented in linear time.

And moreover, it works in place with essentially no additional storage.

We also, in an optional video, formally proved the correctness of QuickSort, and

remember QuickSort is independent of how you implement the ChoosePivot subroutine.

So what we're going to do now is discuss the running time of the QuickSort

algorithm, and this is where the choice of the pivot is very important.

1:52

The key point to realize at this juncture, is that we are not currently in a position

to discuss the running time of the QuickSort algorithm.

The reason is we do not have enough information.

The running time of QuickSort depends crucially on how you choose the pivot.

It depends crucially on the quality of the pivot chosen.

2:15

You'd be right to wonder what I mean by a pivot's quality.

And basically what I mean, is a pivot is good

if it splits the partitioned array into roughly two equal sized subproblems.

And it's bad, it's of low quality, if we get very unbalanced subproblems.

So to understand both, what I mean, and the ramifications of having good

quality and bad quality pivots, let's walk through a couple of quiz questions.

2:39

This first quiz question is meant to explore a sort of worst case execution of

the QuickSort algorithm.

What happens when you choose pivots that are very poorly suited for

the particular input array?

Let me be more specific.

Suppose we use the most naive ChoosePivot implementation,

like we were discussing in the partition video.

So remember, here we just pluck out the first element of the array and

we use that as the pivot.

So suppose that's how we implement the ChoosePivot subroutine, and

moreover, suppose that the input array to QuickSort

is an array that's already in sorted order.

So for example, if it just had the numbers one through eight, it would be one, two,

three, four, five, six, seven, eight, in order.

My question for you is, what is the running time

of this recursive QuickSort algorithm on an already sorted array,

if we always use the first element of a subarray as the pivot?

3:34

Okay, so this is a slightly tricky, but actually a very important question.

So the answer is the fourth one.

So it turns out, that QuickSort, if you pass it an already sorted array and

you're using the first element as pivot elements, it runs in quadratic time.

And remember for a sorting algorithm, quadratic is bad.

It's bad in the sense that we can do better.

MergeSort runs in time n log n, which is much better than n squared.

And if we we're happy with an n squared running time,

we wouldn't have to resort to these sort of relatively exotic sorting algorithms.

We could just use Insertion sort, and we'd be fine.

We'd get that same quadratic running time.

Okay, so now I owe you an explanation.

Why is it that QuickSort can actually run in quadratic time, in this unlucky case,

of being passed an already sorted input array?

4:20

Well to understand, let's think about what pivot gets chosen, and

what are the ramifications of that pivot choice for how the array gets partitioned,

and then what the recursion looks like.

So, let's just think of the array as being the numbers 1 through n, in sorted order.

What is going to be our pivot?

Well, by definition we're choosing the first element of the pivot, so

the pivot's just going to be 1.

Now we're going to invoke the partition subroutine.

And if you go back to the Pseudocode of the partition subroutine, you'll notice

that if we pass an already sorted array, it's going to do essentially nothing.

Okay? So it's just going to advance the index j,

until it falls off the end of the array, and it's just going to return back to us,

the same array that it was passed as input.

So partition subroutine,

if given an already sorted array, returns an already sorted array.

So we have just a pivot 1, in the first position.

And then the numbers 2 through n, in order, in the remainder of the positions.

5:11

So if we draw our usual picture of what a partitioned array looks like,

with everything less than the pivot to the left,

everything bigger than the pivot to the right.

Well, since nothing is less than the pivot, this stuff is going to be empty.

This will not exist.

And to the right of the pivot, this will have length n- 1, and

moreover, it will still be sorted.

5:37

So once partition completes,

we go back to the outer call of QuickSort, which then calls itself recursively twice.

Now in this case, one of the recursive calls is just vacuous.

There's just an empty array, there's nothing to do.

So really there's only one recursive call,

and that happens on a problem of size only one less.

So this is about the most unbalanced split we could possibly see, right,

where one side has 0 elements, one side's n- 1.

Splits don't really get any worse than that.

And this is going to keep happening over, and over, and over again.

We're going to recurse on the numbers 2 through n.

We're going to choose the first element, the 2, as the pivot.

Again, we'll feed it to partition.

We'll get back the exact same subarray that we handed it in.

We get to the numbers 2 through n, in sorted order.

We exclude the pivot 2,

we recurse on the numbers 3 through n, a subarray of length n- 2.

The next recursion level, we recurse on an array of size of length n- 3,

then n- 4, then n- 5, and so on.

Until finally, after I did recursion depth of n,

roughly, we got down to just the last element n, the base case kicks in, and

we return that, and QuickSort completes.

So that's how QuickSort is going to execute on this particular input with

these particular pivot choices, so what running time does that give to us?

6:50

Well, the first observation is that in each recursive call,

we do have to invoke the partition subroutine.

And the partition subroutine does look at every element in the array it has passed

as input.

So if we pass partition in array of length k, it's going to do at least k operations,

because it looks at each element at least once.

So the runtime is going to be bounded below by the work we do in the outermost

call, which is on an array of length n, plus the amount we do in the second

level of recursion, which is on a subarray of length (n- 1) + (n- 2) +, blah, blah,

blah, blah, blah, all the way down to + 1, for the very last level of the recursion.

So this is a lower bound on our running time,

and this is already Theta of n squared.

7:36

So, one easy way to see why this sum n + (n- 1) +, etc., etc.,

leads to a bound of n squared, is to just focus on the first half of the terms.

So, the first n over two terms in the sum are all of magnitude at least n over 2, so

the sum is at least n squared over 4.

It's also evident that this sum is at most, n squared.

So, overall,

the running time of QuickSort on this bad input is going to be quadratic.

Now having understood what the worst case performance for

the QuickSort algorithm is, lets move on to discuss it's best case running time.

Now we don't generally care about the best case performance of algorithms for

it's own sake.

The reason that we want to think about QuickSort in the best case,

first of all it'll give us better intuition for how the algorithm works.

Second of all, it'll draw a line in the sand.

Its average case running time certainly can't be better than the best case, so

this will give us a target for what we're shooting for

in our subsequent mathematical analysis.

8:31

So what were the best case?

What was the highest quality pivot we could hope for?

Well again, we think of the quality of the pivot as the amount of

balance that it provides between the two sub problems.

So ideally, we choose a pivot which gave us two sub-problems,

both of size n over 2 or less.

And there's a name for

the element that would give us that perfectly balanced split.

It's the median element of the array, okay, the element where exactly half of

the elements are less than it and half of the elements are bigger than it.

That would give us an essentially perfect 50, 50 split of the input array.

So, here's the question.

Suppose we had some input and we ranked QuickSort, and

everything just worked in our favor, in the magically, in the best possible way.

That is, in every single recursive invocation of QuickSort,

on any sub array of the original input array.

Suppose, we happen to get, as our pivot the median element.

That is, suppose in every single recursive call.

We wind up getting a perfect 50/50 split of the input array before we recurse.

This question asks you to analyze the running time of this algorithm in

this magical best case scenario.

9:59

That is the running time QuickSort requires in this

magical special case on a array of length n.

As usual, you have a recurrence in two parts.

There's the work that gets done by the recursive cause and

there's the work that gets done now.

Now by assumption, we wind up picking the median as the pivot.

So there's going to be two recursive calls,

each of which will be on an input of size at most N over two.

And, we can write this, this is because the pivot equals the median.

10:34

So that's what gets done by the two recursive calls.

And then how much work do we do outside of the recursive calls?

Well, we have to do the truth pivot subroutine.

And I guess, strictly speaking, I haven't said how that was implemented.

But let's assume that choose pivot does only a linear amount of work.

And then, as we've seen,

the partition subroutine only does a linear amount of work, as well.

So let's say O(n), for work outside of the recursive calls.

10:59

And what do we know?

We know this implies, say by using the master method, or

just by using the exact same argument as for Merge Sort, this gives us a running

time balunt of (nlogn) And again something I haven't really been emphasizing but

which is true is that actually we can write theta (n log n).

And that's because in the recurrence, in fact, we know that the work done outside

of the recursive calls is exactly theta (n), okay?

Partition needs really linear time, not just O(n) time.

In fact the work done outside of the recursive calls is theta (n).

That's because the partition serpentine does indeed look at

every entry in the array that it passed, all right, and as a result,

we didn't really discuss this so much in the master method.

But as I mentioned in passing, if you have recurrences which are tight in this sense,

then the result of the master method

can also be strengthened to be theta instead of just beta.

But those are just some extra details.

The upshot of this quiz is that even in the best case, even if we magically get

prefect pivots throughout the entire trajectory of quick sort.

The best we can hope for is an n log n upper bound,

it's not going to get any better than that.

So the question is how do we have a principled way of choosing pivots so that

we get this best case or something like it that's best case n log n running time.

So that's what the problem that we have to solve next.

So the last couple quizzes have identified a super important question, as far as

the implementation of Quicksort, which is how are we going to choose these pivots?

We now know that they have a big influence on the running time of our algorithm.

It could be as bad as n squared or

as good as m log n, and we really want to be on the m log n side.

So the key question: how to choose pivots.

13:05

By which I mean for every time we recursively call quick sort and

we are pass some subarray of length k.

Among the K candidate pivot elements in the sub-array,

we're going to choose each one with probability of one over k.

And we're going to make a new random choice every time we have qa recursive

call, and we're going to see how the algorithm does.

This is our first example of a randomized algorithm.

This is an algorithm where, if you feed it exactly the same input,

it'll actually run differently, on different execution.

And that's because there's randomness internal to the code of the algorithm.

Now, it's not necessarily intuitive.

The randomization should have any purpose in the computation, in software design and

algorithm design.

But, in fact, and this has been sort of one of the real breakthroughs in algorithm

design, mostly in the '70s, in realizing how important this is,

that the use of randomization can make algorithms more elegant.

Simpler, easier to code, faster, or just simply you can solve

problems that you could not solve as easily without the use of randomization.

It's really one thing that should be in your toolbox as an algorithm designer,

randomization.

Quick Sort will be the first [INAUDIBLE] application, but

we'll see a couple more later in the course.

14:18

Now by the end of this sequence of video's I'll have given you a complete rigorous

argument about why this works.

Why with random pivots, quick sort always runs very quickly, on average.

But, you know, before moving into anything to formal let's develop a little bit of

intuition or at least kind of a daydream.

About why on Earth could this possibly work, why on Earth could this possibly be

a good idea, to have randomness internal to our pro sort implementation.

Well, so first just very high level, what would be sort of the hope, or the dream.

The hope would be, random pivot's are not going to be perfect,

I mean you're not going to just sort of guess the median, or

you only have a one in chance of figuring out which one the median is,

but the hope is that most choices of a pivot will be good enough.

15:12

The first claim is that, you know in our last quiz we said suppose we get lucky and

we always pick the median in every single recursive call.

And we observed we'd do great.

We'd get end log in running time.

So now let's observe that actually to get the end log in running time,

it's not important that we magically get the median every single recursive call.

If we get any kind of reasonable pivot, by which a pivot that gives us some kind of

approximately balanced with the problems, again, we're going to be good.

So the last quiz really wasn't particular to getting the exact median.

Near medians are also fine.

To be concrete, suppose we always pick a pivot which guarantees us a split

of 25 to 75, or better.

That is, both recursive calls should be called on arrays of size,

at most, 75% of the one we started with.

So precisely if we always get a 25, 75 split or better in every recursive call I

claim that the running time of quick sort in that event will be big O of n log n.

Just as it was in the last quiz where we're actually assuming something much

stronger where we're getting a median.

Now this is not so obvious, the fact that 25, 75 splits guarantee analog and

running time.

For those of you that are feeling keen you might want to try to prove this.

You can prove this using a recursion tree argument, but

because you don't have balanced sub problems you have to work a little bit

harder than you do in the cases covered by the master method.

16:41

The second part of the intuition is to realize that actually we don't have to

get all that lucky to just be getting a 25, 75 split.

That's actually a pretty modest goal and

even this modest goal is enough to get n log n running time, right?

So suppose our array contains the numbers, the integers between 1 and

100, so it is an array of length 100.

Think for a second, which of those elements is going to give us

a split that's 25, 75 or better?

So, if we pick any element between 26 and

75 inclusive, will be totally good, right?

If we pick something that's at least 26,

that means the left subproblem is going to have at least the elements 1 through 25.

That'll have at least 25% of the elements.

If we pick something less than 75 then the right sub-problem will have all of

the elements 76 through 100 after we partition, so

that'll also have at least 25% of the elements.

So anything between 26 and 75 gives us a 75-25 split or better.

17:39

But that's a full half of the elements, so

it's as good as just flipping a fair coin hoping to get heads.

So with 50% probability,

we get a split that's good enough to get this n log n bound.

And so again, the high level hope is that often enough, half of the time,

we get these good enough splits, 25-75 split or better, so

that would seem to suggest n log n running time on average is a legitimate hope.

18:07

So that's the high level intuition, but if I were you I would certainly not

be content with this somewhat hand wavy explanation that I've given you so far.

What I've told you is sort of the hope the dream,

why there is at least a chance this might work.

But the question remains, and

I would encourage such skepticism, which is does this really work?

18:27

And to answer that we're going to have to do some actual mathematical analysis, and

that's what I'm going to show you.

I'm going to show you a complete rigorous analysis of the quick sort algorithm with

random pivots, and we'll show that yes in fact, it does really work.

And this highlights what's going to be a recurring them of this course, and

a recurring theme in the study and understanding of algorithms.

Which is that quite often there's some fundamental problem you're trying to code

with a solution, you come up with a novel idea, it might be brilliant, and

it might suck.

You have no idea.

Now, obviously, you code up the idea, run it on some concrete instances and

get a feel for whether it seems like a good idea or not.

But if you really want to know fundamentally what makes the idea good or

what makes the idea bad, really,

you need to turn to mathematical analysis to give you a complete explanation.

And that's exactly what we're going to do with QuickSort,

and then we'll explain in a very deep way why it works so well.

19:25

So under no assumptions about the data, that is, for

every input array of a given length, say n, the average running

time of QuickSort implemented with random pivots is big O of n log n.

And again in fact it's theta of n log n but

we'll just focus on the big O of n log n part.

19:45

So this is a very, very cool theorem about this randomized QuickSort algorithm.

One thing I want to be clear so that you don't under sell this guarantee in

your own mind, this is a worst case guarantee with respect to the input.

Okay so notice at the beginning of this theorem what do we say?

For every input array of length n, all right?

So, we have absolutely no assumptions about the data.

This is a totally general purpose sorting separating which you can

use whatever you want even if you have no idea where the data is coming from.

And these guarantees are still going to be true.

20:20

This of course is something I held forth about at some length

back in our guiding principles video, when I argued that if you can get away with it,

what you really want is general purpose algorithms.

Which make no data assumption, so they can be used over and

over again in all kinds of different contexts and

that still have great guarantees, QuickSort is one of those.

So basically if you have a data set and it fits in the main memory of your machine,

again sorting is a four free sub routine in particular QuickSort.

The QuickSort implementation is for free.

So this just runs so blazingly fast, doesn't matter what the array is,

maybe you don't even know why you want to sort it.

But go ahead, why not?

Maybe it will make your life easier, like it did for example

in the closest pair algorithm for those of you who watch those two optional videos.

21:04

Now the word average does appear in this theorem and

as I've been harping on, this average is not over any assumptions on the data.

We're certainly not assuming the the input array is random in any sense.

The input array can be anything, so where is the average then coming from?

The averaging is coming only from randomness which is internal to

our algorithm.

Randomness that we put in the code in ourselves, that we're responsible for.

21:28

So remember, randomized algorithms have the interesting property that even if you

run it on the same input over and over again,

you're going to get different executions.

So the running time of a randomized algorithm can vary

as you run it on the same input over and over again.

The quizzes have taught us that the running time of QuickSort on a given input

fluctuates from anywhere between the best case of n log

n to the worst case of n squared.

So what this theorem is telling us is that for every possible input array,

while the running time does indeed fluctuate between an upper

bound of n squared and a lower bound of n log n.

The best case is dominating.

On average it's n log n, on average it's almost as good as the best case.

That's what's so amazing about QuickSort.

Those n squared that can pop up once in a while, doesn't matter.

You're never going to see it,

you're always going to see this n log n like behavior in randomized QuickSort.

So for some of you I'll see you next in a video on probability review,

that's optional.

For the rest of you I'll see you in the analysis of this theorem.