0:00

Having completed our first analysis of an algorithm, namely an upper bound on the

Â running time of the Merge Short algorithm. What I wanna do next is take a step back,

Â and be explicit about three assumptions, three biases that we made when we did this

Â analysis of Merge Short, and interpreted the results. These three assumptions we

Â will adopt as guiding principles for how to reason about algorithms, and how to

Â define a so called fast algorithm for the rest of the course. So, the first guiding

Â principle is that we used what's often called worst case analysis. By worst case.

Â Analysis, I simply mean that our upper bound of six N log N plus six N. Applies

Â to the number of lines of executed for every single input array of length end. We

Â made absolutely no assumptions about the input, where it comes from, what it looks

Â like beyond what the input length N was. Put differently, if, hypothetically, we

Â had some adversary whose sole purpose in life was to concoct some malevolent input

Â designed to make our algorithm run as slow as possible. The worst this adversary

Â could do, is. Upper bounded by this same number, 6N log N + 6N. Now, this, so,

Â sort of worst case guarantee popped out so naturally from our analysis of Merge

Â Short, you might well be wondering, what else could you do? Well, two other methods

Â of analysis, which do have their place, although we won't really dicuss them in

Â this course, are quote unquote, average case analysis. And also the use of a set

Â of prespecified benchmarks. By average case analysis, I mean, you analyze the

Â average running time of an algorithm under some assumption about the relative

Â frequencies of different inputs. So, for example, in the sorting problem, one thing

Â you could do, although it's not what we do here. You could assume that every possible

Â input array is equally unlikely, and then analyze the average running time of

Â an algorithm. By benchmarks, I just mean that one agrees up front about some set,

Â say ten or twenty, Benchmark inputs, which are thought to represent practical or

Â typical inputs for the algorithm. Now, both average-case analysis and benchmarks

Â are useful in certain settings, but for them to make sense, you really have to

Â have domain knowledge about your problem. You need to have some understanding of

Â what inputs are more common than others, what inputs better represent typical

Â inputs than others. By contrast, in worst-case analysis, by definition you're

Â making absolutely no assumptions about where the input comes from. So, as a

Â result, worst-case analysis is particularly appropriate for

Â general-purpose sub-routines. Sub-routines that you design. Find without having any

Â knowledge of how they will be used or what kind of inputs they will be used on. And

Â happily, another bonus of doing worst case analysis, as we will in this course, it's

Â usually mathematically much more attractable than trying to analyze the

Â average performance of an algorithm under some distribution over inputs. Or to

Â understand the detailed behavior of an algorithm on a particular set of benchmark

Â inputs. This mathemetical tractabilty was reflected in our Merge Sort analysis,

Â where we had no a priori goal of analyzing the worst case, per se. But it's

Â naturally what popped out of our reasoning about the algorithm's running time. The

Â second and third guiding principles are closely related. The second one is that,

Â in this course, when we analyze algorithms, we won't worry unduly about

Â small constant factors or lower order terms. We saw this philosophy at work very

Â early on in our analysis of merge sort. When we discussed the number of lines of

Â code that the merge subroutine requires. We first upper-bounded it by 4m plus two,

Â for an array of length m, and then we said, eh, let's just think about it as 6m

Â instead. Let's have a simpler, sloppy upper-bound and work with that. So, that

Â was already an example of not worrying about small changes in the constant

Â factor. Now, the question you should be wondering about is, why do we do this, and

Â can we really get away with it? So let me tell you about the justifications for this

Â guiding principle. So the first motivation is clear, and we used it already in our

Â merge short analysis. Which is simply way easier mathematically, if we don't have

Â to, precisely pin down what the [inaudible] constant factors and lower-order terms are.

Â The second justification is a little less obvious, but is extremely important. So, I

Â claim that, given the level at which we're describing and analyzing algorithms in

Â this course, it would be totally inappropriate to obsess unduly about

Â exactly what the constant factors are. Recall our discussion of the merge

Â subroutine. So, we wrote that subroutine down in pseudocode, and we gave it

Â analysis of 4m plus two on the number of lines of code executed, given an input of

Â length m. We also noted that, it was somewhat ambiguous exactly how many lines

Â of code we should count it as, depending on how you count loop increments and so on. So

Â even there it's small constant factors could creep in given the under

Â specification of the pseudo code. Depending on how that pseudo code gets

Â translated into an actual program language like C or Java. You'll see the number of

Â lines of code deviate even further, not but a lot but again by small constant

Â factors. When such a program is then compiled down into machine code, you'll

Â see even greater variance depending on the exact processor, the compiler, the

Â compiler optimizations, the programming implementation, and so on. So to

Â summarize, because we're going to describe algorithms at a level. That transcends any

Â particular programming language. It would be inappropriate to specify precise

Â constants. The precise constants were ultimately determined by more machine

Â dependent aspects like who the programmer is, what the compiler is, what the

Â processor is, and so on. And now the third justification is frankly, we're just going

Â to be able to get away with it. [sound] That is, one might be concerned that

Â ignoring things like small constant factors leads us astray. That we wind up deriving

Â results which suggest that an algorithm is fast when it's really slow in practice, or

Â vice versa. And for the problems we discuss in this course we'll get extremely

Â accurate predictive power. Even though we won't be keeping track of lower terms and

Â constant factors. When the mathematical analysis we do suggests that an algorithm

Â is fast, indeed it will be. When it suggests that it's not fast, indeed that

Â will be the case. So we lose a little bit of granularity of information. But we

Â don't lose in what we really care about, which is accurate guidance about what

Â algorithms are gonna be faster than others. So the first two justifications, I

Â think, are pretty self evident. This third justification is more of an assertion, but

Â it's one we'll be baking up over and over again as we proceed through this course.

Â Now, don't get me wrong. I'm not saying constant factors aren't important in

Â practice. Obviously, for crucial programs the constant factors are hugely important.

Â If you're running the sorta crucial loop, you know, your start up's survival depends

Â on, by all means optimize the constant like crazy. The point is just that

Â understanding tiny constant factors in the analysis is an inappropriate level of

Â granularity for the kind of algorithm analysis we're going to be doing in this

Â course. Okay, lets move on the, the third and final guiding principle. So the third

Â principle is that we're going to use what's called asymptotic analysis, by

Â which I mean we will focus on the case of a large input sizes. The performance of an

Â algorithm as the size N of the input grows large, that is, tends to infinity. Now

Â this focus on large input size is it was already evident when we interpreted our

Â bound on Merge Sort. So, how did we describe the bound on Merge Sort? We said,

Â oh, well, it needs a number of operations proportional, a constant fact or times in

Â login. And we very cavalierly declared that this was better than any algorithm

Â which has quadratic dependence of it's running time on the number of operations.

Â So for example, we argued that merge sort is a better, faster algorithm than

Â something like insertion sort, without actually discussing the constant factors

Â at all. So mathematically. We were saying the running time of merge short, which we

Â know, which we can represent as the function. Six N log base two of N + 6N

Â is better than any function which has a quadratic dependence on n. Even one with a

Â small constant like lets say 1/2 N squared which might roughly be the running

Â time of insertion sort. And this is a mathematical statement that is true if and

Â only if N is sufficiently large once N grows large it is certainly true that

Â the expression on the left is smaller than the expression on the right but for small

Â N the expression on the right is actually going to be smaller because of the smaller

Â leading term so in saying that merge sort is superior to insertion sort the bias

Â is that we're focusing on problems with a large N so the question you should have is

Â that reasonable is that a justified assumption to focus on large input sizes

Â and the answer is certainly yes. So the reason we focus on large input sizes is

Â because, frankly, those are the only problems which are even, which are at all

Â interesting. If all you need to do is sort 100 numbers, use whatever method you want,

Â and it's gonna happen instantaneously on modern computers. You don't need to know

Â say, the divide and conquer paradigm, if all you need to do is sort 100 numbers. So

Â one thing you might be wondering is if, with computers getting faster all the time

Â according to Moore's Law, if really, it doesn't even matter to think about

Â algorithmic analysis, if eventually all problem sizes will just be trivially

Â solvable on super fast computers. But, in fact, the opposite is true. Moore's Law,

Â with computers getting faster, actually says that our computational ambitions will

Â naturally grow. We naturally focus on ever larger problem sizes. And the gulf between

Â an N squared algorithm and an m log n algorithm will become ever wider. A

Â different way to think about it is in terms of how much bigger a problem size

Â you can solve. As computers get faster. If you are using an algorithm with a running

Â time which is proportional to the input size then the computers get faster by a

Â factor of four then you can solve problems that are factor of four or larger. Whereas

Â if you are using an algorithm whose running time is proportional to the square

Â of the input size then a computer gets faster by a factor of four, you can only

Â solve double the problem size and we'll see even starker examples of this gulf

Â between different algorithm approaches as the time goes on. So to drive this point

Â home. Let me show you a couple of graphs. So what we're looking at here, is we're

Â looking at a graph, of two functions. So the solid function. Is the upper bound

Â that we proved on merge sort. So this is gonna be 6nlog(base2)n plus 6n. And the

Â dotted line is an estimate. A rather generous estimate about the running time

Â of, [inaudible] sort. Namely one-half times N. Squared. And we see here in the

Â graph exactly the behavior that we discussed earlier, which is that the small

Â N. Down here. In fact because one-half N. Squared has a smaller leading constant

Â it's actually a smaller function. And this is true up to this crossing point of maybe

Â 90 or so. Again, beyond n=90. The quadratic growth in the N squared term.

Â Overwhelms the fact that it had a smaller constant and it starts being bigger than

Â this other function six of N + six N so in the regime below 90 it's predicting that

Â the insertion store will be better and in the regime above 90 it's predicting that

Â merge sort will be faster. Now here's what's interesting let's scale the X axis

Â let's look well beyond this crossing point of 90 let's just increase it in order of

Â magnitude up to a raise in size 1500. And I want to emphasize these are still very

Â small problem sizes. If all you need to do is sort arrays of size 1500 you really don't

Â need to know Divide-and-conquer or anything else we'll talk about -- that's a

Â pretty trivial problem on modern computers. [sound]. So what we're seeing

Â is, that even for very modest problem sizes here, array of, of, size, say 1500.

Â The quadratic dependence in the insertion sort bound is more than dwarfing the

Â fact, that it had a lower constant factor. So in this large regime, the gulf between

Â the two algorithms is growing. And of course, if I increased it another 10X or

Â 100x or 1000x to get to genuinely interesting problem sizes, the gap between

Â these two algorithms would be even bigger, it would be huge. That said, I'm not

Â saying you should be completely ignorant of constant factors when you implement

Â algorithms. It's still good to have a general sense of what these constant

Â factors are so for example in highly tuned versions of Merge Sort which you'll find

Â in mny programming libraries. In fact, because of the difference in

Â constant factors, the algorithm will actually switch from Merge Sort over to

Â insertion sort, once the problem size drops below some particular threshold, say

Â seven elements, or something like that. So for small problem sizes, you use the

Â algorithm with smaller constant factors, and the insertion sort for larger problem

Â sizes, you use the algorithm and better rate of growth, mainly merge short. So, to

Â review our first guiding principal is that we're going to pursue worse case analysis.

Â We're going to look to bounds on the performance, on the running time of an

Â algorithm which make no domain assumptions, which make no assumptions

Â about which input of a given length the algorithm is provided. The second guiding

Â principal is we're not going to focus on constant factors or lower returns, that

Â would be inappropriate, given the level of granularity at which we're describing

Â algorithms and third is where going to focus on the rate of growth of algorithms

Â for large problem sizes. Putting these three principles together, we get a

Â mathematical definition of a fast algorithm. Namely, we're gonna pursue

Â algorithms whose worst case running time grows slowly as a function of the input

Â size. So let me tell you how you should interpret what I just wrote down in this

Â box. So on the left hand side is clearly what we want. Okay, we want algorithms

Â which run quickly if we implement them. And on the right hand side is a proposed

Â mathematical surrogate of a fast algorithm. Right, the left hand side is

Â not a mathematical definition. The right hand side is, as well become clear in the

Â next set of lectures. So we're identifying fast algorithms, which, those that have

Â good asymptotic running time which grows slowly with the input size. Now, what

Â would we want from a mathematical definition? We'd want a sweet spot. On one

Â hand we want something we can actually reason about. This is why we zoom out and

Â squint and ignore things like constant factors and lower terms. We can't keep

Â track of everything. Otherwise we'd never be able to analyze stuff. On the other hand

Â we don't want to throw out the baby with the bath water, we want to retain

Â predictive power and this turns out this definition turns out for the problems

Â we're going to talk about in this course, to be the sweet spot for reasoning about

Â algorithms okay worst case analysis using the asymptotic running time. We'll be able to

Â prove lots of theorems. We'll be able to establish a lot of performance guarantees

Â for fundamental algorithms but at the same time we'll have good predictive power what

Â the theory advocates will in fact be algorithms that are known to be best in

Â practice. So, the final explanation I owe you, is, what do I mean by, the running

Â time grows slowly, with respect to the input size? Well, the answer depends a

Â little bit on the context, but, for almost all of the problems we're going to

Â discuss, the holy grail will be to have what's called a linear time algorithm, an

Â algorithm whose number of instructions grows proportional to the input size. So,

Â we won't always be able to achieve linear time, but that's, in some sense, the best

Â case scenario. Notice, linear time is even better than what we achieved with our

Â merge short algorithm for sorting. Merge short runs a little bit superlinear, it's

Â n times log n, running as the input size. If possible, we. To be linear time. It's

Â not always gonna be possible, but that is what we will aspire toward. For most of

Â the problems we'll discuss in this course. Looking ahead, the next series of videos

Â is going to have two goals. First of all, on the analysis side, I'll describe

Â formally what I mean by asymptotic running time. I'll introduce "Big Oh"

Â notation and its variants, explain its mathematical definitions, and give a

Â number of examples. On the design side, we'll get more experience applying the

Â divide and conquer paradigm to further problems. See you then.

Â