0:08

We have seen several operations on collections and

how to implement them in parallel so far.

We have seen the map that applies a given function to each element of the list,

producing a new list of the same length.

0:21

We have seen fold and it's varied entry use, it takes the least of elements,

and combines all the elements with a given operation.

And now we consider scan.

Scan operation on ordered collections of elements has aspects of both map and fold.

1:26

So we start with our initial element and then to obtain next one,

we'll just add the first element of the list because our operation is addition.

We will then add the second element of the list.

And finally, the last element of our list.

1:55

When we apply scanLeft with initial left a0 and some binary operation f,

the result is the list b0, b1, b2, and b3 that's computed as follows.

b0 is just a given initial element, and

then b1 is computed by taking b0, combining it using f with a1.

2:23

Then we take b2, combine it with a a3 is b3.

Now throughout this segment, we will assume that operation f is associative.

Even in that case, there exists a dual operation,

scanRight which is different from scanLeft.

2:41

If for example, we take the same List(1, 3,

8) and apply scanRight with initial element 100, and

an associative operation of addition, the result will be the following list.

3:30

Such that the first element is given by the initial element,

and then subsequent elements are obtained using this equation.

So now you can give a sequential definition of scanLeft.

Let's consider scanLeft, whose input is an array, and

we have then the initial element a0 and binary operation f.

The results should be written into the output array out.

4:14

The first element of the output is a0 itself.

And then we compute subsequent elements in a loop.

The current value that we are writing is stored in the variable a and

the index is i.

So after a0, the first element computed will be f(a),

which is initially a0 and then input of 0.

That element will be written to out of 1.

And then we continue in the loop.

You can see that there is one more write than read from the array.

4:47

Can we make scanLeft parallel operation?

Remember, we are assuming that f is associative.

We have a somewhat ambitious goal of coming up with an algorithm

that still runs in all log n, even infinite parallelism.

5:03

At first, this task seems almost impossible, because

the value of the last element in sequence is computed from the previous element.

And for every element,

it looks like the natural way is indeed what we gave in the sequential algorithm.

But even if we parallelize the individual applications of f,

we would not be able to parallelize the traversal of the array itself.

So this would give us still a linear algorithm even with infinite parallelism.

So, we need to perform computation in a different order,

the idea is to give up reusing all intermediate results.

And in fact, we will do more work and

more applications of f that need the simple sequential version.

However, this will allow us to improve parallelism and

in terms of the parallel running time, more than compensate for

the fact that we are applying f a few more times than in the sequential algorithm.

6:32

Then reduce is going to be used in order to

simply reduce that segment of the area A, and

we can also use map which can apply an operation on a given array segment and

write the resulting output array.

7:26

So element in position i in the output, is the result of

reducing the segment of the input array up to that position.

Therefore, the resulting array will be

obtained using a map over the input array where

this function fi given to map is in fact going

to reduce the array segment from 0 to i.

The invocation of map will fill in the output array With elements,

starting from 0 and ending in including input length- 1.

8:08

We then just need to write the final element of the output array.

That final element is computed by taking the element before the final and

combining it with the corresponding element of the input array.

If map and reduce are implemented in parallel,

and they each have log and parallel complexity,

then because map is applying all these individual operations in parallel,

you can see that the overall depth is going to continue to be log N.

In the previous solution,

we did not reuse any computation across different elements of the output array.

9:10

Now recall, that reduce itself proceeds from applying the operations

in a tree, in order to obtain parallel implementation for associative operation.

So the idea is then, to save this intermediate results to save this tree and

make use of it when computing the output collection.

The fact that is tree is also good for making use of these values.

9:35

To keep our function simple,

we'll first assume that the input collection itself is also a tree.

This is going to be a different kind of tree compared to the tree that

saves the intermediate results.

9:49

Here are the definitions of our trees.

The input collection will be given using tree of this kind

where all the values are stored in leaves, and

what nodes have is just references to left and right subtrees.

In contrast, the tree storing intermediate values will have an additional res field,

even for the internal nodes of the tree.

We will do that by having a common res field in the super class,

and this field will then exist not only in the leaves but also in the nodes.

11:18

Let's take a simple example.

This is a balanced tree that stores four integers, 1, 3, 8, and 50.

And if we compute reduceRes on this tree, we obtain a new tree

that has the values of the corresponding sums if we give plus as the operation.

11:56

Of course, we would like to do this computation in parallel, but

all we need to do in order to accomplish that is to insert this

parallel keyword in front of the two recursive invocations.

The resulting function, we will call upsweep.

12:37

The computation of scanLeft, given the result tree, is called downsweep.

Downsweep takes an initial element a0, which plays an important role here.

And the tree of results.

And the binary operation f.

13:05

To understand how downsweep works,

the key fact to remember is that a0 is supposed to denote the reduce

of all elements that come to the left of the current tree, t, that we are given.

At the very beginning, this is the initial element, 100.

As we move down the tree, then we get some elements that proceeded.

So, for example, for this subtree here.

13:42

When we have the Leaf, then, we simply need to apply

operation f to the given element a0 and the element in the Leaf.

The interesting case is of course case of Node.

We are going to again recursively do a downsweep on the left and right subtree.

This will give us two new trees, left and right, and then, we will combine it.

The key question is,

what is the initial element that we are passing to these two subtrees?

Now, what are the things that are to the left of our left subtree here?

Well, they are the things that they are left to the entire tree.

So, we are passing the same element a0.

14:46

Let's see how this works on our example tree.

Given that initially a0 is 100,

then this value of 100 will be passed to the left subtree.

This will then be passed to the leaf and

we will compute the result, 101.

On the other hand, to the right subtree, or this subtree,

we will be parsing the result of combining 100 with the value of the left subtree.

We will be parsing 101 here.

Therefore, when we reach to leaf 3, we'll compute 104.

Now, let's step back and look what happens in the right subtree of the big tree.

There we are going to pass the combination of a0,

which is 100, and left.res.

And left.res is already stored up front in our result tree.

So, we did not need to wait for this computation in order to know what 4 is

because this was computed in a separate pass.

That mean that it will be passing 104 down here as the initial element.

16:05

This will also become initial element in the left of the tree and

the result will be 112 and for

the right of the tree we will combine this 104 with the result here so

it will be passing 112 here, and the result would be 162.

So we see that we have computed precisely the suffix of scan left.

And all we need to do is to add this initial element at

the very beginning of the tree.

17:12

Prepend is just an operation on a binary tree.

If we do not worry for the moment about balancing, then for

leaf, we just create a node that contains a new leaf,

namely the element we are prepending, and for the node, we are prepending

only in the left subtree, leaving the right subtree as it was.

17:34

And this completes the definition of scanLeft because upsweep and

downsweep were parallel.

And we have only two of them.

And because prepend is just logarithmic operation.

If we have approximately balanced tree, we have a good parallel running time.

17:54

scanLeft on trees that we just presented was formulated

to make the code as simple as possible.

To make the implementation more efficient,

in practice we would not store individual elements in the leaves of the tree, but

use chunks of elements represented using RA for example.

I leave as an exercise to define scanLeft on trees whose leaves store arrays.

Now we will go further and

examine parallel scan that is applied to a collection represented in an array.

So we will have just one big array to start with.

Interestingly, even in this case

we will use a tree to store our intermediate results.

The tree of intermediate results looks very similar to the tree we had before.

You can see that the node stores left and

right sub-trees, as well as the value that we have computed for them.

On the other hand, there is a small change in the leaf.

Because we want to be able to process our as efficiently we want to stop

when the trunks little bit of processing are small enough and

we will represent these chunks using in this case from and to.

We will not store the actual content of those arrays You just store

these in the decision to the big array and pass the reference to that array.

So we're not storing values at all really.

We're just storing the indication of where those values can be found.

19:23

Given this definition of three of results for the array, here's then

the implementation of app three Upsweep has the base case, and the recursive case.

And the base case as usual is involved

when the segment we are processing is sufficiently small.

When the boundaries of the segment are given.

19:42

From in doing this, this input the I and P.

Given small enough segment, we're just going to apply or reduce sequentially.

We will then store the value reduce as well as the indices from and to.

We will see the implementation of reduceSeg1, which for

the purpose of this algorithm can be done sequentially.

So what do we do in the recursive case?

We split the array segment into 2.

Approximately in the middle, then we evoke upsweep on the two smaller segments.

We will get the two tree area resulting traits,

and then we need to update also the result of our victory by applying function f.

So this computation looks very similar to the original one.

It's just that we are starting from array as an input which we implicitly view

as a tree.

So here is then the reduced segment one

implementation that works on array segments.

And because we use it here in the base case.

We just have this sequential implementation.

Even though this implementation takes a0, the initial element,

the way we have in fact invoked it was with input

element of the array at the initial element of the segment.

And then reducing starting from the subsequent segment.

So that has the same effect as having a reduce without the initial element by

taking all the elements between and including from and and up to 2 minus 1.

Here's the downsweep on the array.

It's going to take this tree of results that we just computed.

And then we have the case for Leafs and for Nodes.

When we've reached a Leaf, we are going to do sequential computation,

as usual, now all we have are pointers,

to the array, but we have this input array as well.

So we are going to do sequential scan left for this segment between from and to.

And we will write the result to the corresponding indices of the output array.

Let's see the scanLeft,

can see that there's no point doing anything if left is not less than right.

22:21

what about the recursive case, the recursive case we have two sub

trees left and right so we're just going to do down sweep for

these two sub trees here we are invoking one in the left sub tree,

here on the right sub tree This is all done in parallel.

Again, as we before as we're processing the tree of results.

22:46

Whereas the initial element for the right sub tree is a0 combined with

the result of reducing the left symmetry Now that we have seen up sweep and

down sweep, we can implement the entire scan left on arrays.

We first invoke up sweep on the input array becoming zero and input length.

23:12

Down sweep is going to write the results into the output array.

The writing itself is performing the base case.

Let's examine again this scan left segment.

You can see that, again output is written to the elements

that are one more because of this increment here, one more than the input.

Because of that, let's now go back to the overall algorithm

23:45

The only thing that remains to do is to prepend this element a0 which in

this case because the indices were set up appropriately, amounts to simply

Writing a0 to the index 0 of the output array.

And this completes the description of the parallel scan on the array.