And actually, this is just an empirical validation of running, again,

1 million sorts just to show that the actual behavior is like that.

And that's interesting mathematic.

What is this curve?

That's a new function.

It's not a normal distribution, it's definitely worthy of mathematical study.

After all, this is probably one of the world's most heavily used algorithms.

We might want to understand this property of it.

Maybe that'll help us understand something else in the future.

So the bottom line from all of this work, and

this is just a summery of the 50 years of reasearch on Quicksort.

Is that really a lot is known about the performance of Quicksort.

And there is plenty of interesting of research problems in the, literally,

thousands of papers that have been written

about Quicksort in the 50 years since it was discovered.

And that's not the end of the story, I have two more comments.

The first one is about prediction, and

this is something that we teach in the very beginning of the algorithms book.

That just using a hypothesis like this makes it simple to predict performance,

even of extremely complicated algorithms in extremely complicated scenarios.

So if that's our hypothesis, what we're going to do is run the experiment for

some input size N, some big N.

And maybe run it a bunch of times to get a good average on the running time.

If we wanted to, that's enough information to solve for a.

Whatever our time is for whatever N is, then a's the only unknown,

and we can go ahead and compute a.

But actually, we usually don't need to even do that.

We can just predict the time for some constant times N, just by doing the math.

So say we wanted to predict the time for

10N knowing that we have the running time for N.

Well, if you just do the math, if we take a(10N) log (10N), over aN log N.

The a's cancel, we don't even have to know the constant.

It says that the running time is going to increase by a factor of 10,

plus 1 over a log base ten of N.

And this gets smaller as N gets bigger, so

its going to increase the problem size by about 10.

You're going to increase the running time about 10.

And that's a very accurate prediction, and that's useful.

And that we now teach to first year students, and

in fact we teach to nearly 70% of all Princeton students this simple idea.

That if you run a program for a big input, and you have a decent model for

its performance, you can predict the running time.

So for example, run it for a 100,000, 100 times.

And again we can run as many experiments as we want nowadays, and

it takes 4 seconds.

So that means that I'm going to predict that it's going to take

just a little more than 40 seconds for 1 million.