In the earlier videos from this week, as well as from the videos from the past several weeks, you've already seen the basic building blocks of forward propagation and back propagation, the key components you need to implement a deep neural network. Let's see how you can put these components together to build your deep net. Here's a network of a few layers. Let's pick one layer. And look into the computations focusing on just that layer for now. So for layer L, you have some parameters wl and bl and for the forward prop, you will input the activations a l-1 from your previous layer and output a l. So the way we did this previously was you compute z l = w l times al - 1 + b l. And then al = g of z l. All right. So, that's how you go from the input al minus one to the output al. And, it turns out that for later use it'll be useful to also cache the value zl. So, let me include this on cache as well because storing the value zl would be useful for backward, for the back propagation step later. And then for the backward step or for the back propagation step, again, focusing on computation for this layer l, you're going to implement a function that inputs da(l). And outputs da(l-1), and just to flesh out the details, the input is actually da(l), as well as the cache so you have available to you the value of zl that you computed and then in addition, outputing da(l) minus 1 you bring the output or the gradients you want in order to implement gradient descent for learning, okay? So this is the basic structure of how you implement this forward step, what we call the forward function as well as this backward step, which we'll call backward function. So just to summarize, in layer l, you're going to have the forward step or the forward prop of the forward function. Input al- 1 and output, al, and in order to make this computation you need to use wl and bl. And also output a cache, which contains zl, right? And then the backward function, using the back prop step, will be another function that now inputs da(l) and outputs da(l-1). So it tells you, given the derivatives respect to these activations, that's da(l), what are the derivatives? How much do I wish? You know, al- 1 changes the computed derivatives respect to deactivations from a previous layer. Within this box, right? You need to use wl and bl, and it turns out along the way you end up computing dzl, and then this box, this backward function can also output dwl and dbl, but I was sometimes using red arrows to denote the backward iteration. So if you prefer, we could draw these arrows in red. So if you can implement these two functions then the basic computation of the neural network will be as follows. You're going to take the input features a0, feed that in, and that would compute the activations of the first layer, let's call that a1 and to do that, you need a w1 and b1 and then will also, you know, cache away z1, right? Now having done that, you feed that to the second layer and then using w2 and b2, you're going to compute deactivations in the next layer a2 and so on. Until eventually, you end up outputting a l which is equal to y hat. And along the way, we cached all of these values z. So that's the forward propagation step. Now, for the back propagation step, what we're going to do will be a backward sequence of iterations in which you are going backwards and computing gradients like so. So what you're going to feed in here, da(l) and then this box will give us da(l- 1) and so on until we get da(2) da(1). You could actually get one more output to compute da(0) but this is derivative with respect to your input features, which is not useful at least for training the weights of these supervised neural networks. So you could just stop it there. But along the way, back prop also ends up outputting dwl, dbl. I just used the prompt as wl and bl. This would output dw3, db3 and so on. So you end up computing all the derivatives you need. And so just to maybe fill in the structure of this a little bit more, these boxes will use those parameters as well. wl, bl and it turns out that we'll see later that inside these boxes we end up computing the dz's as well. So one iteration of training through a neural network involves: starting with a(0) which is x and going through forward prop as follows. Computing y hat and then using that to compute this and then back prop, right, doing that and now you have all these derivative terms and so, you know, w would get updated as w1 = the learning rate times dw, right? For each of the layers and similarly for b rate. Now the computed back prop have all these derivatives. So that's one iteration of gradient descent for your neural network. Now before moving on, just one more informational detail. Conceptually, it will be useful to think of the cache here as storing the value of z for the backward functions. But when you implement this, and you see this in the programming exercise, When you implement this, you find that the cache may be a convenient way to get to this value of the parameters of w1, b1, into the backward function as well. So for this exercise you actually store in your cache to z as well as w and b. So this stores z2, w2, b2. But from an implementation standpoint, I just find it a convenient way to just get the parameters, copy to where you need to use them later when you're computing back propagation. So that's just an implementational detail that you see when you do the programming exercise. So you've now seen what are the basic building blocks for implementing a deep neural network. In each layer there's a forward propagation step and there's a corresponding backward propagation step. And has a cache to pass information from one to the other. In the next video, we'll talk about how you can actually implement these building blocks. Let's go on to the next video.