0:04

All right, now you've gotten warmed up with a project and now we want to think

about performance, and this is a common theme throughout the entire course.

We'll be writing code but then we'll also be talking about analyzing it,

thinking about if it's good enough, and what that even means.

So, by the end of this first video, you'll have a better sense of what it means to

talk about the performance of an algorithm and why it's important.

And then we'll also describe some factor that impact the performance of

an algorithm.

So, first of all, before we delve into an example,

let's remind ourselves that an algorithm is a strategy for solving a problem.

So when we talked about the performance of an algorithm,

we're talking about how good that strategy is for

coming at the solution of the problem that we're thinking about.

So, let's think about an example that comes from the project.

And in the project we're analyzing a whole bunch of text, and

we're trying to understand how readable it is.

That's the first part of the project.

And Christine suggested that we used the Flesch score and

she talked about that in some of the earlier videos.

And so we have a way of computing some numerical representation

of the readability of a text, based on the number of words,

the number of sentences, and the number of syllables.

Okay, so that's great.

We read in a whole bunch of text, we compute this Flesch score and

then we have a description of how readable the text is.

But the question is, how long does it take?

So, from when we've read in the text to when we have the Flesch score available

for our user.

And if we want to answer that question, we might just time it.

So, we might start a stopwatch and press go when our algorithm to compute

the Flesch score starts, and then stop the stopwatch when we've gotten the answer.

The issue, though, is that this time that we get

might not be a great representation of how good our algorithm is.

So we're asking how good is the algorithm for computing flesh score?

And so, before we get to answering that question, let's think about what might be

the problem with just looking at the stopwatch time.

So if we think about how much time it takes for

our code to run on a particular input, what might impact that time?

Well, so, for example, the computer that we're running on,

will have a huge part to play in how quickly the code runs.

So, if we run on really old hardware, then we expect

our code to take longer than if we run it on a really snazzy new machine.

Or if the computer's really bogged down with a whole lot of other programs,

then we expect it to be a little bit slower than one that

is devoting all of its resources to just the one program that we're running.

2:40

Other factors that might come into play,

include the compiler that we use to compile our code into machine language.

But also decisions that we make about what libraries to use and what

optimizations were made in those libraries for the particular commands that we want.

So, just running our code on a particular input and timing it

doesn't really give us a big picture of our overall strategy for the code.

So you can think about the problem solving paradigm, as we're given a problem,

compute the Flesch score of given text.

And we've got a strategy for computing that Flesch score that maybe says,

read in all the text and compute how many words there are.

Read in all the text and compute the number of sentences.

Read in all the text and compute the number of syllables.

And then once we've got those numbers,

then we do something with it and compute the Flesch score.

And so, that high-level description is like our algorithm, and

then we've coded it up in Java, or you've done it in the project, and

we have a particular implementation.

And what we'd like to do is talk about the performance of the algorithm and

sometimes that's tied into the implementation as well.

And so, just the time for running the specific code on a specific machine

on a specific input doesn't give us the whole picture.

So, should we just give up trying to measure performance?

Well we shouldn't because performance is so important.

Performance is the difference between running a really cool algorithm

that let's us answer very hard questions in very short amount of time.

And therefore, processing more and more data that then leads

to analyze even bigger trends and answering more important questions.

And if we don't have handle on the size of the data that we can

process with our algorithms, then maybe we think we have a really cool solution but

if we try to run it, then the program's not gonna stop until the end of time.

And so, having a sense of how our solutions interact with larger data and

more complicated data is at the heart of solving these problems.

So we don't want to give up, but what can we do?

Well, maybe instead of measuring the time, we're going to count operations.

Because hopefully we have a sense of,

5:01

perhaps if we have some sort of basic operation, then

the amount of time that it takes to make that operation, well, we can't control it.

But, we'll know that our algorithm takes not too many operations.

And so, it won't take that long.

Okay, so, we are deciding that we're just going to focus on what we can control,

which is the number of operations we asked the computer to do,

rather than how long the computer takes to do those operations.

And, so we might go ahead and try to count them.

In our example of the Flesch score,

we want to start at the first part of our text, and go through it.

And for example, we're going to count the number of syllables in the document.

And so we want to count the number of operations that happens.

The thing is, though, the number of operations that we have to do in

order to count the syllables, depends on our input.

So, if we're looking at the whole American tax code, that's

going to be a whole lot of operations to count all the syllables in there.

As opposed to, if we just looked at a short snippet,

like in the example that Christine did, where it's still a complicated text, but

we're just looking at a small part of it.

And so the number of operations that we execute will be smaller.

6:10

Now, notice that we're performing the exact same algorithm,

whether we're looking at the whole tax code or just a small part of it.

And so our measure of performance should be about the algorithm

shouldn't be talking about the input.

So what we need to do, is focus on how the performance scales,

which means how does the number of operations grow as our input grows?

So for example, we'll ask questions like, if the list,

which is the string of text that we're processing,

is twice as long, how many more operations will it take us to process it?

Okay, so, we've moved away from just timing using the clock time, and

we've gone to looking at the number of operations our algorithm takes, and

we're not just going to look at the number as an absolute number.

We're thinking about how it grows and changes as the input size changes.

Okay, but is size all that matters?

For example, if I have two collections of text, say the American tax code and

then something else that is equally long, that are standing right in front of me and

I try to process both of these texts one after the other.

Will the number of operations that's required to process the IRS tax code,

be exactly the same as the number of operations required to process

something else?

Maybe, maybe not.

This is a worthwhile question to consider, and so

we're going to take that into account as well, when we think about performance.

So, our three strategies for talking about performance include,

#1, count the operations instead of the time.

So we abstract away from the machine considerations and the software

considerations that aren't part of the algorithm design problem that we faced.

#2, is focus on how the performance scales.

Because we care about really big inputs and so

we wanna know about what happens eventually as the output gets bigger and

bigger and bigger, how does our algorithm handle it?

And #3, go beyond the input size because what we'd like to know,

is how the algorithm responds in all sorts of situations.

And so we'd like our performance analysis to be able to capture not just the size

of the input but

also what might happen because of internal structure to the input.

Okay so, in the next video what we're going to do,

is delve right in to counting operations for specific codes snippets.