0:00

When implementing a deep neural network, one of the debugging tools I often use to

Â check the correctness of my code is to pull a piece of paper, and

Â just work through the dimensions and matrix I'm working with.

Â So let me show you how to do that, since I hope this will make it easier for

Â you to implement your deep nets as well.

Â Capital L is equal to 5, right, counting quickly, not counting the input layer,

Â there are five layers here, so four hidden layers and one output layer.

Â And so if you implement forward propagation,

Â the first step will be z1 = w1x + b1.

Â So let's ignore the bias terms b for now, and focus on the parameters w.

Â Now this first hidden layer has three hidden units, so this is layer 0,

Â layer 1, layer 2, layer 3, layer 4, and layer 5.

Â So using the notation we had from the previous video, we have that n1,

Â which is the number of hidden units in layer 1, is equal to 3.

Â And here we would have the n2 is equal to 5,

Â n3 is equal to 4, n4 is equal to 2, and n5 is equal to 1.

Â And so far we've only seen neural networks with a single output unit, but in later

Â courses, we'll talk about neutral networks with multiple output units as well.

Â And finally, for the input layer,

Â we also have n0 = nx = 2.

Â So now, let's think about the dimensions of z, w, and x.

Â z is the vector of activations for

Â this first hidden layer, so z is going to be 3 by 1,

Â it's going to be a 3-dimensional vector.

Â So I'm going to write it a n1 by 1-dimensional vector,

Â n1 by 1-dimensional matrix, all right, so 3 by 1 in this case.

Â Now how about the input features x, x, we have two input features.

Â So x is in this example 2 by 1, but more generally, it would be n0 by 1.

Â So what we need is for the matrix w1 to be something that when we

Â multiply an n0 by 1 vector to it, we get an n1 by 1 vector, right?

Â So you have sort of a three dimensional vector equals

Â something times a two dimensional vector.

Â And so by the rules of matrix multiplication,

Â this has got be a 3 by 2 matrix.

Â Right, because a 3 by 2 matrix times a 2 by 1 matrix, or

Â times the 2 by 1 vector, that gives you a 3 by 1 vector.

Â And more generally, this is going to be an n1 by n0 dimensional matrix.

Â So what we figured out here is that

Â the dimensions of w1 has to be n1 by n0.

Â And more generally, the dimensions of wL must be nL by nL minus 1.

Â So for example, the dimensions of w2,

Â for this, it would have to be 5 by 3,

Â or it would be n2 by n1.

Â Because we're going to compute

Â z2 as w2 times a1, and again,

Â let's ignore the bias for now.

Â And so this is going to be 3 by 1,

Â and we need this to be 5 by 1, and so

Â this had better be 5 by 3.

Â And similarly, w3 is really the dimension of the next layer,

Â comma, the dimension of the previous layer,

Â so this is going to be 4 by 5, w4

Â 4:22

Is going to be 2 by 4, and

Â w5 is going to be 1 by 2, okay?

Â So the general formula to check is that when

Â you're implementing the matrix for layer L,

Â that the dimension of that matrix be nL by nL-1.

Â Now let's think about the dimension of this vector b.

Â This is going to be a 3 by 1 vector, so you have to add that to another

Â 3 by 1 vector in order to get a 3 by 1 vector as the output.

Â Or in this example, we need to add this, this is going to be 5 by 1,

Â so there's going to be another 5 by 1 vector.

Â In order for the sum of these two things I have in

Â the boxes to be itself a 5 by 1 vector.

Â So the more general rule is that in the example on the left,

Â b1 is n1 by 1, right, that's 3 by 1,

Â and in the second example, this is n2 by 1.

Â And so the more general case is that

Â bL should be nL by 1 dimensional.

Â So hopefully these two equations help you to double check that the dimensions

Â of your matrices w, as well as your vectors p, are the correct dimensions.

Â And of course, if you're implementing back propagation,

Â then the dimensions of dw should be the same as the dimension of w.

Â So dw should be the same dimension as w,

Â and db should be the same dimension as b.

Â Now the other key set of quantities whose dimensions to check are these z,

Â x, as well as a of L, which we didn't talk too much about here.

Â But because z of L is equal to g of a of L, applied element wise,

Â then z and a should have the same dimension in these types of networks.

Â Now let's see what happens when you have a vectorized implementation that looks at

Â multiple examples at a time.

Â Even for a vectorized implementation,

Â of course, the dimensions of wb, dw, and db will stay the same.

Â But the dimensions of z, a, as well as x will

Â change a bit in your vectorized implementation.

Â So previously,

Â we had z1 = w1x+b1

Â where this was n1 by 1,

Â this was n1 by n0,

Â x was n0 by 1, and b was n1 by 1.

Â Now, in a vectorized

Â implementation, you

Â would have z1 = w1x + b1.

Â Where now z1 is obtained by taking the z1 for

Â the individual examples, so there's z11, z12,

Â up to z1m, and stacking them as follows, and this gives you z1.

Â So the dimension of z1 is that, instead of being n1 by 1,

Â it ends up being n1 by m, and m is the size you're trying to set.

Â The dimensions of w1 stays the same, so it's still n1 by n0.

Â And x, instead of being n0 by 1 is now

Â all your training examples stacked horizontally.

Â So it's now n 0 by m, and so you notice that when you take

Â a n1 by n0 matrix and multiply that by an n0 by m matrix.

Â That together they actually give you an n1 by m dimensional matrix, as expected.

Â Now, the final detail is that b1 is still n1 by 1, but

Â when you take this and add it to b, then through Python broadcasting,

Â this will get duplicated and turn n1 by m matrix, and then add the element wise.

Â So on the previous slide, we talked about the dimensions of wb, dw, and db.

Â Here, what we see is that whereas zL as

Â well as aL are of dimension nL by 1,

Â we have now instead that ZL as well AL are nL by m.

Â And a special case of this is when L is equal to 0,

Â in which case A0, which is equal to just

Â your training set input features X,

Â is going to be equal to n0 by m as expected.

Â And of course when you're implementing this in backpropagation,

Â we'll see later you, end up computing dZ as well as dA.

Â And so these will of course have

Â the same dimension as Z and A.

Â So I hope the little exercise we went through helps clarify the dimensions that

Â the various matrices you'd be working with.

Â When you implement backpropagation for a deep neural network, so long as you work

Â through your code and make sure that all the matrices' dimensions are consistent.

Â That will usually help,

Â it'll go some ways toward eliminating some cause of possible bugs.

Â So I hope that exercise for figuring out the dimensions of various matrices you'll

Â been working with is helpful.

Â When you implement a deep neural network, if you keep straight

Â the dimensions of these various matrices and vectors you're working with.

Â Hopefully they'll help you eliminate some cause of possible bugs,

Â it certainly helps me get my code right.

Â So next, we've now seen some of the mechanics of how to do forward

Â propagation in a neural network.

Â But why are deep neural networks so effective, and

Â why do they do better than shallow representations?

Â Let's spend a few minutes in the next video to discuss that.

Â