0:01
In fact the order of growth classifications are so important they've
led to enormous amount of research in recent years and just talk briefly about
that now. So there is, life is a little bit more complicated than pointed out in
the last example and one problem is that the inputs can cause the performance of
the algorithm to vary widely. So often we have to think about different ways of
analyzing the algorithm depending on the input. So, the running time is going to be
somewhere between the best case and the worst case. Best case is the lower bound
on cost it. It provides something that the running time is going to be bigger than
that always or not less than that and then there's the worst case which is the most
difficult input. If we analyze that then we can guarantee that the running time in
the algorithms not going to be bigger than that. And then in a lot of situations we
might consider our input to be random. Well we need to, someway to model, what we
mean by random for the problem that we're solving but there is a lot of situations
where we can do that and then we have a way to predict performance even when the
input might vary widely. So for example for 3-sum, it's kind of always the same.
With the tilde notation, the only variability in that algorithm is the
number of times the counter is incremented and that's in low order terms so it
doesn't need to chew up in our analysis. For binary search it's, you might find the
thing right away in which case is constant time and we can show that the average and
the worst case are both lg based two(N). There's other, in another examples that be
much more variability even. So, we have this different types of analysis depending
on the input. And but the question is, what about the actual problem that the
client is trying to solve? So we have to understand that two in order to be able to
understand performance of the algorithm. And there's two approaches that are, or
successful in this. One is to design for the worst case. Just to make sure that
your algorithm are, always runs quickly and that's definitely ideal. Another is
to, if you can't do that is to randomize and then depend on some kind of
probabilistic guarantee and we'll see examples of both of these as we go through
the course. Now, those kinds of considerations, you know the idea of order
of growth leads to discussion of, what's called, what I call the theory of
algorithms. And here our goals are, we have a problem to solve like solve the
3-sum problem and we want to know how difficult it is. We want to find the best
algorithm for solving that problem. The approach that the computer scientist use
for this is to try to suppress as many details as possible in the analysis. And
so just analyze the running time to or within a constant factor. That's what
order of growth is getting at and also I want to, not worry about the input model
at all. And so we focused on worst case design and we can talk about performance
of algorithms just in turn of the order of growth and it's actually possible, it's
actually possible to do that in a very rigorous way that it's taught us a lot
about the difficulty of solving problems. And our goal is to find an optimal
algorithm where we can guarantee to within a constant factor certain performance for
any input cuz we discovered the worst case but we also can have approved that didn't
know algorithm could provide a better performance guarantee. I'll give a couple
of easy examples of this. Now in order to do this they're, these commonly used
notations called the big theta, big O and big omega notations. So the and those
definitions are given here. So big theta notation is just the way to describe the
order of growth. Theta(N)^2 is kind of short hand for anything N^2. It's bounded
above and below by constant time N^2 and that's what we really use to classify
algorithms. And then, there is big O notation which is upper bounds on
performance. When we say O(N^2), we mean that it's less than some constant time N^2
as N grows. And big omega is used for lower bounds means greater than some
constant time N^2 as N grows. So those three notations were able to use to
classify algorithms and I'll show them in the following. So, examples from our
1-sum, 2-sum, and 3-sum are easy to articulate so our goals are to establish
the difficulty of the problem and to develop an optimal algorithm. So, the
1-sum problem is 00 in the array. Well, an upper bound on the difficulty of the
problem is some specific algorithm. So, for example, the brute force algorithm
that looked, that looks at every array entry is a specific algorithm and it means
that and that takes O(N) time. We have to look at every, it's less than a constant
time N for some constant. So, the running time of the optimal algorithm has to be
O(N) that is that's specific algorithm provides an upper bound on the running
time of the optimal algorithm. And but in this case it's also easy to develop a
lower bound, that's a proof that no algorithm can do better. Well, for 1-sum
you have to examine all entries in the array. If you miss one, then that one
might be zero so that means that the optimal algorithm has to have a running
time at least some constant times N where we say the running time is omega of n. Now
in this case, the upper bound and the lower bound match. So, doing the constant
factor so, that's a proof that the brute force algorithm for 1-sum is optimal. It's
running time is theta(N). It's both omega and O(N). That's, for that simple problem
it was okay to get the optimal algorithm. For a more complicated problems it's going
to be more difficult to get upper balance and lower balance and particularly upper
balance and lower balance that match. For example let's look at 3-sum. So, upper
bound for 3-sum, say our first brute force algorithm, say that the proof, was a proof
that the running time of the optimal algorithm is O(N^3) but we found a better
improved algorithm. Whose running time is O(N^2) lg N. So, that's a better upper
bound. Lower bound well, we have to examine all entries cuz again, we might
miss one that makes 3-sum = zero and that's a proof that the running time in
the optimal algorithm is omega(N) but nobody knows higher or lower bound for
3-sum. So there's a gap between the upper bound and the lower bound and open
problems. Is there an optimal algorithm for 3-sum? We don't know what it is. We
don't even know if there's a algorithm whose running time is < O(N^2) or we don't
know higher lower bound and linear. So that's an example of an open problem in
the theory of algorithms we don't know how difficult it is to solve the 3-sum
problem. Now, this point of view has been extremely successful in recent decades. We
have a new problem, develop some algorithm, proves some lower bound. If
there's a gap, we look for new algorithm that will lower the upper bound or we try
to find a way to raise the lower bound. Usually it's very difficult to prove
non-trivial or lower bounds. Trivial or lower bound like look at every input items
is not so hard non-trivial lower bounds like for example, the proof that we're
talking about for Union-find problem are much more difficult. And in the last
several decades people have learned about the computational difficulty of problems
by examining steadily decreasing upper bounds so the algorithms were better worst
case running times for lots and lots of important problems and plenty of optimal
algorithms and plenty of gaps still remain. It's a fascinating field of
research that many people are engaged in. Now there is a couple of caveats on this
on the context to this course. And the first one is maybe it's overly pessimistic
to be focusing on the worst case. We've got data out there. We've got problems to
solve. Maybe it's not worst case data and lots of fields of engineering and science.
We don't focus on the worst case. The worst case for this course would be
lightning to strike and it would be over so we don't plan for that. And since
similar it's true for algorithms. Maybe we should be focusing on understanding prope
rties of the input and finding algorithms that are efficient for that input. And the
other thing is in order to really predict performance and compare algorithms we need
to do a closer analysis than to within a constant factor. So we talked about the
tilde notation in the big theta, big O, and big omega, omega that are used in the
theory of algorithms. And really there's so much published research in the theory
of algorithms that a lot of people make the mistake of interpreting the big O
results that are supposed to give improved upper bounds on the difficulty of the
problem as approximate models for the running time and that's really a mistake.
So in this course, we're going to focus on approximate models by, you know making
sure that we use the tilde notation and we'll try to give specific results for
certain quantities of interest and the constant, any unspecified constant in the
running time. We'll have to do with properties in the machine and in the
system so they will be able to use these results to predict performance and to
compare algorithms.