SPEAKER: Now that we have learned how to create parallel programs using async-finish algorithmic notation or actually using the concrete Java ForkJoin framework, we should pay attention to how we reason about the parallelism in the program. So let me go back to our standard divide and conquer example. In finish-async notation, if we had S1, S2, S3, S4, what we would do is say async of S2, and then a finish around S2, S3. And that will enable S2 and S3 to run in parallel with each other. Now, in the ForkJoin parallel programming framework, what you learned was you can get the same effect by doing S1 fork S2, S3 join S2, S4. So that's the concrete implementation of the async-finish parallelism using the ForkJoin programming framework. Now we want to introduce an interesting concept called the computation graph to model parallel programs such as this. And the idea is we think in terms of the program executing dynamically. This graph is purely a mental abstraction. It does not actually get constructed or built when your program is running. But logically, we can think of statement S1 executing. And then after that, it does-- if you look at the ForkJoin version, it does a fork. So we have a fork over here of S2. And immediately after the fork, S1 can continue on to S3. So we call that a continue operation. And after S3, the same task wants to continue on to S4. But there's this join over here. So for that, we have a different kind of edge called a join edge. So with these three kinds of edges, we see we can model the execution of this parallel program-- and in fact, any parallel program-- as a graph that conceptually is built when the program is executing. Each vertex or node of this directed graph represents a sequential subcomputation, something we refer to as a step. And each edge refers to an ordering constraint. If you just had a normal sequential program with no fork and join, our graph would just be a straight line with continue edges. But with parallelism, we see we have these fork edges and the join edges. Now, here's the interesting property of the graph. If you want to reason about which statements can execute in parallel, we ask "Is there a path of directed edges from one statement to another?" So for example, there's a path from S2 and S4. So that tells us that S2 and S4 cannot run in parallel with each other. But between S2 and S3, we can see there's a parallel execution that's possible, because there's no path of directed edges between S2 and S3. So that gives the basic property about what can run in parallel. And we can use it in multiple ways. It's a very handy abstraction to think in terms of debugging. For example, if by mistake in S3, we were starting to try to read some sum computed by S2, and in S2, we were starting to write to that field sum, then we would have an error, because the read and write could go in parallel with S2 and S3 not being connected with each other in the computation graph. There's no path of edges between them. That's a very pernicious kind of bug in parallel programming. That's called a data race. And that happens exactly when a reader and a write access, or two write accesses to the same location, can occur in different steps of the computation graph that may execute in parallel with each other. And you already know how to tell if they may execute. Another very interesting property of computation graphs is that we can use them to reason about the performance of your parallel program. So let's just say in abstract units, S1 took one unit of time. S3 takes 10, S2 takes 10, and S4 takes one. There are two important metrics that we will work with to reason about the performance. The first is called the "WORK". And that's simply the sum of the execution times of all the nodes. So in this case, it would be 1 plus 10 plus 10 plus 1. That's 22. Now, another metric that's really important is called the "SPAN". And that's the length of the longest path. People also refer to these paths as critical path length. Sometimes, you might say "hey, you're on my critical path" if you're trying to get some work done. So with the longest path, we have to take a look and see what's going on. We see that there's a path over here of length 12, and there's a path over here with length 12. So there are two paths with the longest length, and that can happen. But the span is 12, because it's the length of the longest path. And these two metrics help us reason about the parallelism in the program. So one of the things that we will build on is this notion of ideal parallelism, which is work divided by span. It gives you a very concrete metric of how much parallelism is there in a computation graph. In this case, it's 22 by 12, which is quite modest. That's just under a factor of 2. For a sequential program, it would just be 1, because the span would be the same as the work. But what we will see is for a rich set of parallel algorithms, the ideal parallelism can be really large, of the order of thousands or millions, and that allows us to take the same program and execute it on a wide range of different multicore processors and other forms of parallel computers.