0:00

In this video, we'll apply the divide and

conquer algorithm design paradigm to the problem of multiplying matrices.

This will culminate in the study of Strassen matrix multiplication algorithm.

And this is a super cool algorithm for two reasons.

First of all, Strassen's algorithm is completely non-trivial.

It's totally non-obvious, very clever,

not at all clear how Strassen ever came up with it.

The second cool feature is it's for such a fundamental problem.

So computers as long as they've been in use from the time they

were invented up til today a lot of their cycles is spent multiplying matrices.

It just comes up all the time in important applications.

So let me first just make sure we're all clear on what the problem is of

multiplying two matrices.

0:48

The ideas we'll talk about are also relevant for

multiplying non square matrices but we're not going to discuss it in this video.

The entries in these matrices, you could think of it as whatever you want.

Maybe they're integers, maybe they're rationals.

Maybe they're from some finite field.

It depends on the application but

the point is they're entries that we can add and multiply.

1:30

So if ij was this red square, this red entry over in the z matrix that

would be derived from the corresponding row of the x matrix and

the corresponding column of the y matrix.

And recall what I mean by dot product, that just means you take the products of

the individual components and then add up the results.

1:51

So ultimately the zij entry boils down to a sum over n things

where each of the constituent products is just the xik entry,

the ik entry of the matrix x with a kj entry of the matrix y.

Where your k is ranging from 1 to n.

So that's how zij is defined for a given pair of indices i and j.

2:15

One thing to note is that where we often use n to denote the input size,

here we're using n to denote the dimension of each of these matrices.

The input size is not n, the input size is quite a bit bigger than n.

Specifically, each of these are n x n matrices that contains n squared entries.

2:32

So since presumably we have to read the input, which has size n squared, and

we have to produce the output, which also has size n squared.

The best we could really hope for a matrix multiplication algorithm would be

a running time of n squared, so the question is how close can we get to it.

2:47

Before we talk about algorithms for matrix multiplication let me just make sure we're

all crystal clear on exactly what the problem is so let's just actually spell

out what would be the result of multiplying two different 2 x 2 matrices.

So we can parameterize two generic 2 x 2 matrices by just giving the first one

entries a, b, c, and d.

Or these four entries could all be anything.

And then, we're multiplying by a second 2 x 2 matrix.

Let's call its entries e, f, g, and h.

Now what's the result of multiplying these?

Where again, it's going to be a 2 x 2 matrix where each entry is just

the corresponding dot product of the relevant row of the first matrix and

column of the second matrix.

So to get the upper left entry we take the dot product of the upper row of the first

matrix and the first column of the left column of the second matrix so

that results in ae + bg.

To get the upper right entry we take the dot product of the top row of

the left matrix with the right column of the second matrix, so

that gives us af + bh and then filling in the other entries in the same way,

we get ce + dg and cf + dh.

Okay, so that's multiplying two matrices, and

we've already discussed the definition in general.

Now, suppose you had to write a program to actually compute the result of

multiplying two n x n matrices.

One natural way to do that would just be to return to the definition,

which defines each of the n squared entries in the z matrix as a suitable sum

of products of entries of the x and y matrices.

So in the next quiz I'd like you to figure out exactly what would be the running time

of that algorithm as a function of the matrix dimension n.

Where, as usual, we count the additional multiplication of two individual entries

as a constant time operation.

4:47

So the correct response to this quiz is the third answer.

That the running time of the straightforward iterative algorithm

runs in cubic time relative to the matrix dimension n.

To see this just recall what the definition of the matrix

multiplication was.

The definition tells us that each entry zij of

the output matrix z is defined as the sum from k = 1 to n of xik times ykj.

That is the dot product of the ith row of the x matrix and

the jth column of the y matrix.

I'm certainly assuming that we have the matrices represented

in a way that we could access a given entry in a constant time.

And under that assumption remember each of these products only takes constant time.

And so then to compute zij we just have to add up these n products so

that's going to be theta of n time to compute a given zij and

then there's an n squared entries that we have to compute.

There's n choices for i, n choices for j.

So that gives us n squared times n or cubic running time overall.

For the natural algorithm, which is really just a triple for loop which computes

each entry of the output array separately using the dot product.

So the question as always for the keen algorithm designer is, can we do better.

Can we beat n cube time for multiplying two matrices.

6:09

And we might be especially emboldened with the progress that we've already seen in

terms of multiplying two integers.

We apply the divide and

conquer algorithm to the problem of multiplying two integers.

And we had both a naive recursive algorithm and

a seemingly better algorithm due to Gauss, which made only three recursive calls.

Now we haven't yet analyzed the running time of that algorithm.

But as we'll see later, that does indeed beat

the quadratic running time of the grade school algorithm.

So it's very natural to ask, can we do exactly the same thing here?

There's the obvious n cubed algorithm which follows straight from

the definition, perhaps analogous to Gauss we could have some clever divide and

conquer method which beats cubic time.

So that's what we're going to explore next.

6:50

Let's recall the divide and conquer paradigm, what does it mean to use it?

Well we first have to identify smaller subproblems, so

if we want to multiple two n x n matrices.

We have to identify multiplications of smaller matrices that we can solve

recursively.

Once we figured out how we want to divide the given problem into smaller ones.

Then the conquer step, we simply invoke our own algorithm recursively.

That's going to recursively multiply the smaller matrices together.

And then in general, we'll have to combine the results of the recursive cause to

get the solution for the original problem.

In our case, to get the product of the original two matrices

from the product of whatever submatrices we identify.

So how would we apply the divide and conquer paradigm to matrices?

So we're given two n x n matrices, and we have to somehow identify

smaller pairs of square matrices that we can multiply recursively.

7:41

So the idea I think is fairly natural.

So we start with a big n by n matrix x, right, so there's n rows and n columns.

And we have to somehow divide it into smaller pieces.

Now the first thing you might think about is you put it into it's left half and

it's right half analogous to what we've been doing.

With arrays, but

then we're going to break X into two matrices which are no longer square,

which are n over 2 in one dimension, and have length n in the other dimension.

And we want to recursively call a subroutine that multiplies

square matrices.

So what seems like the clear thing to do, is to divide X into quadrants.

Okay, so we have four pieces of X, each is going to be n over 2 by n over 2

corresponding to the different quarters of this matrix.

So let's call these different quadrants or blocks in matrix terminology A, B, C,

and D.

All of these are n over 2 by n over 2 matrices.

As usual for simplicity, I'm assuming that n is even.

And as usual it doesn't really matter and we can do the same trick with Y.

9:03

So what I mean by that,

is that the product of X and Y can be expressed in terms of its quadrants.

And each of its four quadrants, each of its four corners

can be written as a suitable arithmetic expression of the quadrants of X and Y.

So here's exactly what those formulas are.

They're exactly analogous to when we just multiplied a pair of 2 by 2 matrices.

9:26

So I'm not going to formally prove this fact.

I'm sure many of you have seen it before or familiar with it.

And if you haven't, it's actually quite easy to prove.

It's not obvious since you can't see it off the top of your head necessarily.

But if you go back to the definition, it's quite easy to verify.

But indeed when you multiply X and Y,

you can express its quadrants into blocks in terms of the blocks of X and Y.

9:46

So what we just did is completely analogous to when talking about

integer multiplication, and we wanted to multiply two integers, x and y, and

we broke them into pairs of n over 2 digits.

And then we just took the expansion, and we've observed how that expansion could be

written in terms of products of n over 2 digit numbers.

It's the same thing going on here, except with matrices.

So now we're in business as far as a recursive approach.

We want to multiply x and y.

They're n by n matrices.

We recognize, we going to express that product, x times y.

In terms of the products of n over 2 by n over 2 matrices.

Things were able to multiply recursively, plus additions.

And additions, it's clearly easy to multiply two different matrices with,

say, n squared entries each.

It's going to be linear in the number of entries.

So it's going to be n squared time to add two matrices that are n by n.

So this immediately leads us to our first recursive algorithm.

10:43

And now our first recursive algorithm is simply to evaluate all of these

expressions in the obvious way.

So specifically in step one,

we recursively compute all of the necessary products.

10:54

And observe that there are eight products that we have to compute.

Eight products with n over 2 by n over 2 matrices.

There are four entries in this expansion of x times y.

Each of the blocks is the sum of two products and

none of the products reoccurred.

They're all distinct.

So naively, if you want to evaluate this,

we have to do eight different products of n over 2 by n over 2 matrices.

11:33

Now the question you should be asking is, is this a good algorithm?

Was this good for anything?

This recursive approach.

Splitting x and y into these blocks.

Expanding the product in terms of these blocks then recursively

computing each of the blocks.

And I want to say it's totally not obvious.

It is not clear what the running time of this recursive algorithm is.

I'm going to go ahead and give you a spoiler which is going to follow from

the master method that we'll talk about in the next lecture.

But it turns out,

with this kind of recursive algorithm where you do eight recursive calls.

Each on a problem with dimension half as much as what you started with, and

then do quadratic time outside, the running time is going to be cubic.

So exactly the same as with the straightforward iterative algorithm that

follows from the definition.

That was cubic, it turns out, and that was clearly cubic.

This one, although it's not obvious, is cubic as well.

So no better, no worse than the straightforward iterative algorithm.

12:29

So in case you're feeling disappointed that we went through all this work in this

sort of seemingly clever divide and conquer approach for matrix multiplication

and came out at the end no better than the iterative algorithm.

Well, there's really no reason to despair.

because remember back in integer multiplication,

we had a straightforward recursive algorithm.

Where we have to do four recursive calls.

Products of n over 2 digit numbers.

But then we had Gauss' trick, which said if we only did more clever products and

more clever additions and subtractions,

then we can get away with only three recursive calls.

And we'll see later if that is indeed a significant savings in the time we did for

integer multiplication.

And we've done nothing analogously clever to Gauss' trick for

this matrix multiplication problem.

All we did is the naive expansion, in terms of submatrices, and

the naive evaluation of the resulting expressions.

So, the $64,000 question is then, can we do something clever,

to reduce the number of recursive calls, from 8 down to something lower?

And that is where Strassen's Algorithm comes in.

13:27

So the high level Strassen's Algorithm has two steps,

just like the first recursive algorithm that we discussed.

It recursively computes some products of smaller matrices and

over to a roughly n over 2 by n over 2 matrices.

13:54

In step two then is to actually produce the product of x and y.

Produce each of those four blocks that we saw.

With suitable additions and subtractions of these seven products.

And again, these are much less straightforward than in the first

recursive algorithm.

14:12

And so, while the additions and subtractions involved will be a little bit

more numerous than they were in the naive recursive algorithm.

It's only going to change the work in that part of the algorithm by

a constant factor.

So we'll still spend only theta (n squared) work adding and

subtracting things, and

we get a huge win in decreasing the number of recursive calls from 8 to 7.

Now just in case you have the intuition that shaving off one of eight

recursive calls should only decrease the running time of an algorithm

by one-eighth by 12.5%.

In fact it has a tremendously more amplified effect.

Because we do one less recursive call over and over and

over again as we keep recursing in the algorithm.

So it makes a fundamental difference in the eventual running time of the algorithm

as we'll explore in detail in the next set of lectures

when we discuss the master method.

So again a bit of a spoiler alert.

What you're going to see in the next set of lectures that indeed

Strassen's Algorithm does beat cubic time.

It's better than n cubed time.

You'll have to watch the next set of lectures if you want to know just what

the running time is.

But I'm going to tell you now that savings of the eighth recursive call is what

changes the algorithm from cubic to subcubic.

15:20

Now, 1969 was, obviously, quite a bit before my time.

But by all accounts, from people I've talked to who were around then and from

what the books say, Strassen's Algorithm totally blew people's minds at the time.

Everybody was just assuming that there's no way you could do better than

the iterative algorithm, the cubic algorithm.

It just seemed that matrix multiplication intuitively fundamentally required

all of the calculations that are spelled out in the definition.

So Strassen's Algorithm is an early glimpse of the magic and

of the power of clever algorithm design.

But if you really have a serious ingenuity, even for

super fundamental problems,

you can come up with fundamental savings over the more straightforward solution.

Solutions. So

16:01

those are the main points I wanted to talk about Strassen's algorithm.

How you can beat cubic time by saving a recursive call with suitably chosen

clever products and clever additions and subtractions.

Maybe a few of you are wondering what are these cleverly chosen products,

can you really do this?

And I don't blame you.

There's no reason to believe me just because I sort of spelled out this high

level idea.

It's not obvious this should work.

You might want to actually see the products.

So for those of you like that, this last slide is for you.

16:30

So here is Strassen's Algorithm in it's somewhat gory detail.

So let me tell you what the seven products are that we're going to form.

I'm going to label them P1 through P7 and they're

all going to be defined in terms of the blocks of the input matrices x and y.

So let me just remind you that we think of x in terms of its blocks A,

B, C, D and we think of y in terms its blocks E, F, G, H.

And remember a through h are all n over 2 by n over 2 sub-matrices.

So here are the seven products P1 though P7.

P1 = A(F-H),

17:49

So I hope you'll agree that these are indeed only seven products.

And we could compute these with seven recursive calls.

So we've pre-processed with a little bit of additions and subtractions.

We have to compute F minus H, A plus B, C plus D, and so on.

We compute all these new matrices from the blocks.

And then we can recursively, with seven recursive calls.

Do the seven products that operate on n over 2 by n over 2 matrices.

Now the question is why is this useful,

why on earth would we want to know the values of these seven products.

So the amazing other part of the algorithm is that from just these seven

products we can using only addition and

subtraction recover all four of the blocks of x times y.

So x times y you recall we solved it in terms of blocks.

18:36

So we previously computed this to be AE+BG in the upper left corner and

similarly expressions for the upper right, lower left or lower right blocks.

So this we already know.

So the content of the claim is that these four blocks also arise

from the seven products in the following way.

18:57

So the claim here is that these two different expressions for

x times y are exactly the same, and they're the same block by block.

So in the other words, what the claim is that this crazy expression

P5+P4-P2+P6 where those are four of the products that

we have listed above, that is precisely AE+BG.

Similarly we're claiming the P1+P2 as exactly

AF+BH that's actually easy to see P3+P4 is CE+DG.

That's also easy to see.

And then the other complicated one is the P1+P5-P3-P7 is exactly

the same as CF+DH, so all four of those hold.

So, let me, just so you believe me because I don't know why you'd believe me unless

I showed you some this derivation.

19:45

Let's just look at the proof of one of the cases of the upper left corner.

So that is let's just expand out this crazy expression, P5+P4-P2+P6.

What do we get?

Well, from P5 we get, AE+AH+DE+DH,

then we add P4, so that's going to give us,

+DG-DE, then we subtract P2,

so that gives us a -AH-BH, and then we add in P6, so

that gives us a BG+BH-DG-DH.

Okay so what happens next, well now we look for

cancellations, so we cancel the AH's we cancel the DE's,

we cancel the DH's, we cancel the DG's,

we cancel the BH's and holy cow, what do we get?

We get AE+BG, that is we get exactly

what we were supposed to in the upper left block of x times y.

So we just actually verified that this equation holds for the upper left block.

It's quite easy to see that it holds for the upper right and lower left blocks.

And then a comparable calculation verifies it for the lower right blocks of the two.

So summarizing, because this claim holds, because actually we can recover the four

blocks of x times y from these seven products.

Strassen's algorithm works in the following way.

You compute the seven products P1 through P7 using seven recursive calls.

Then you just compute the four blocks using some extra additions and

subtractions as shown in the claim.

So it's seven recursive calls on n over 2 by n over 2 matrices.

Plus n squared work due to the necessary additions and as you'll

see in the master method lecture that is actually sufficient for subcubic time.

22:05

And indeed this sort of illustrates,

the difference between checking somebody's proof and coming up with a proof.

Given that I told you the magical seven products and

how you from them you can recover the four desired blocks of x times y.

It's really just mechanical to see that it works.

It's a totally different story of how do you come up with P1 through P7 in

the first place.

So how did Strassen come up with them?

Honestly, your guess is as good as mine.