0:00

in the earlier videos from this week as

well as from the videos from the past

several weeks you've already seen the

basic building blocks of board

propagation and back propagation the key

components you need to implement a deep

neural network let's see how you can put

these components together to build a

deep net use the network with a few

layers let's pick one layer and look at

the computations focusing on just that

layer for now so for layer L you have

some parameters WL and Bo and for the

forward prop you will input the

activations a L minus one from the

previous layer and output Al so the way

we did this previously was you compute Z

l equals WL x al minus one plus BL um

and then al equals G of Z L right so

that's how you go from the input al

minus one to the output Al and it turns

out that for later use will be useful to

also cache the value ZL so let me

include this on cache as well because

storing the value Z L will be useful for

backward for the back propagation step

later and then for the backward step or

three for the back propagation step

again focusing on computation for this

layer L you're going to implement a

function that inputs da of L and outputs

da L minus one and just a special the

details the input is actually da FL as

well as the cache so you have available

to you the value of ZL that you compute

it and

in addition to outputting GL minus 1 you

will output

you know the gradients you want in order

to implement gradient descent for

learning ok so this is the basic

structure of how you implement this

forward step I'm going to call it a

forward function as well as backward

step we shall call it back wave function

so just to summarize in layer L you're

going to have you know the forward step

or the forward property' forward

function input a L minus 1 and output al

and in order to make this computation

you need to use WL n PL um and also

output a cache which contains ZL and

then on the backward function using the

back prop step will be another function

then now inputs the AFL and outputs da

ll minus 1 so it tells you given the

derivatives respect to these activations

that's da FL what are the derivatives or

how much do I wish you know a L minus 1

changes computed derivatives respect to

D activations from the previous layer

within this box right you need to use WL

and BL and it turns out along the way

you end up computing DZ l um and then

this box this backward function can also

output DW l and DB l well now sometimes

using red arrows to denote the backward

generations so if you prefer we could

draw these arrows in red so if you can

implement these two functions then the

basic computation of the mirror network

will be as follows you're going to take

the input features a 0 see that in and

that will compute the activations of the

first layer let's call that a 1 and to

do that you needed W 1 and B 1 and then

we'll also you know cache

the way z1 now having done that you feed

that this is the second layer and then

using W 2 and B 2 you're going to

compute the activations our next layer a

2 and so on until eventually you end up

outputting a capital L which is equal to

Y hat and along the way we cashed all of

these on values Z so that's the forward

propagation step now for the back

propagation step what we're going to do

will be a backward sequence of

iterations in which you're going

backwards and computing gradients like

so so as you're going to feed in here da

L and then this box will give us da of L

minus 1 and so on until we get da - da 1

you could actually get one more output

to compute da 0 but this is derivative

respect your input features which is not

useful at least for training the weights

of these are supervised neural networks

so you could just stop it there belong

the way back prop also ends up

outputting DW l DB l right this used

upon so wo and BL um this would output d

w3 g p3 and so on so you enter

computing all the derivatives you need

6:27

WL PL and it turns out that we'll see

later that inside these boxes we'll end

up computing disease as well so one

innovation of training for a new network

involves starting with a zero which is X

and going through for profit as follows

computing Y hat and then using that to

compute this and then back prop right

doing that and now you have all these

derivative terms and so you know W will

get updated as some W minus the learning

rate times DW right for each of the

layers and similarly for B right now

we've compute the back prop and have all

these derivatives so that's one

iteration of gradient descent for your

neural network now before moving on just

one more implementational detail

conceptually will be useful to think of

the cashier as storing the value of Z

for the backward functions but when you

implement this you see this in the

programming exercise when you implement

it

you find that the cash may be a

convenient way to get the value of the

parameters at W 1 V 1 into the backward

function as well so the program exercise

you actually spawn the cash is Z as well

as W and B all right so to store z2w to

be to go from an implementational

standpoint I just find this a convenient

way to just you know get the parameters

copied to where you need to need to use

them later when you're computing back

propagation so that's just an

implementational detail that you see

when you do the programming exercise so

you've now seen one of the basic

building blocks for implementing a deep

neural network image layer there's a for

propagation step and there's a

corresponding backward propagation step

and as

cash deposit information from one to the

other in the next video we'll talk about

how you can actually implement these

building blocks let's go into the next

video