0:00

in the last video we distract what is

Â the deep l-larry neural network and also

Â talked about the notation we use to

Â describe such networks in this video you

Â see how you can perform for propagation

Â in a deep network as usual let's first

Â go over what forward propagation will

Â look like for a single training example

Â X and then later on we'll talk about the

Â vectorized version where you want to

Â carry out forward propagation on the

Â entire training set at the same time but

Â some given a single training example X

Â here's how you compute the activations

Â of the first layer so for this first

Â layer you compute z1 equals W 1 times X

Â plus b1 so W 1 and B 1 of parameters

Â that affect the activations in layer 1

Â right where this is there one of the

Â neural network and then you compute the

Â activations for that layer to be equal

Â to G of Z 1 and the destination function

Â G depends on what layer you're at and

Â maybe index AB has the activation

Â function from there 1 so if you do that

Â you've now computed the activations from

Â layer 1

Â how about layer to say that layer well

Â you would then compute Z 2 equals W 2 A

Â 1 plus B 2 and then so the activation of

Â layer 2 is the way matrix times the

Â output of layer 1 so is that value plus

Â the bias vector for layer 2 and then a2

Â equals the activation function apply to

Â z2 ok so that's it for layer 2 and so on

Â and so forth until you get to the output

Â layer that's layer 4 where you would

Â have that Z 4 is equal to the parameters

Â for that layer times the activations

Â from the previous layer

Â Plus that by this vector and then

Â similarly a four equals G of v4 and so

Â that's how you compute your estimated

Â output Y hat so just one thing to notice

Â X here is also equal to a zero because

Â the input feature vector X is also the

Â activations of layer 0 so we scratch out

Â X and with a cross for X and put a 0

Â here then you know all of these

Â equations they see look the same right

Â the general rule is that VL is equal to

Â WL times a of L minus 1 plus B L 1 there

Â and then the activations so that layer

Â is the activation function applied to

Â the values Z so that's the general for

Â propagation equation so we've done all

Â this for a single training example how

Â about for doing it in a vectorized way

Â for the whole training set at the same

Â time the equations look quite similar as

Â before for the first layer you would

Â have Capital Z 1 equals W 1 times

Â capital X plus B 1 and then a 1 equals G

Â of Z 1 right and bearing in mind that X

Â is equal to a 0 these are just you know

Â the training examples stacked in

Â different columns you could take this

Â let me scratch out X so you can put a 0

Â there and then so the next layer looks

Â similar

Â Z 2 equals W 2 A 1 plus B 2 and a 2

Â equals G of Z 2

Â we're just taking these vectors e or a

Â and so on and stacking them up so this

Â is V vector for the first training

Â example V vector for the second training

Â example and so on down to the M train

Â example and stacking these in columns

Â and calling this Capital Z all right and

Â similarly for capital A just as capital

Â X all the training examples are column

Â vectors smacks left to right and then

Â again end of this process you end up

Â with Y hat which is equal to G of v4 so

Â this is also equal to a 4 and that's the

Â predictions on all the view training

Â examples is stacked horizontally so just

Â to summarize our notation I'm going to

Â modify this up here our notation allows

Â us to replace lowercase Z and a with the

Â uppercase counterparts is that already

Â looks like a capital D and that gives

Â you the vectorized version the fourth

Â obligation that you carry out on the

Â entire training set at a time where a 0

Â is X now if you look at this

Â implementation of vectorization it looks

Â like that there is going to be a for

Â loop here right so it's left for l

Â equals 1 to 4 for l equals 1 through

Â capital L then you have to compute the

Â activations for layer 1 then for layer 2

Â then to layer 3 and then 4 therefore so

Â seems that there is a for loop here and

Â I know that when implementing your

Â networks we usually want to get rid of

Â explicit for loops but this is one place

Â where I don't think there's any way to

Â implement this over other than explicit

Â for loop so we're in implementing for

Â propagation it is perfectly OK to have a

Â for loop they compute the activations

Â for layer 1 then there are 2 then they

Â are threes and therefore no one knows

Â and I don't think there is this any way

Â to do this without a for loops that goes

Â from 1 to capital L from 1 through the

Â total number of layers and in your

Â network so this place is perfectly okay

Â to have an explicit form so

Â that's it for the notation for deep

Â neural networks as well as how to do for

Â propagation in these networks if the

Â pieces we've seen so far looks a little

Â bit familiar to you that's because what

Â we've seen is taking a piece very

Â similar to what you've seen in the

Â neural network with a single hidden

Â layer and just repeating that more times

Â now turns out that we implemented deep

Â neural network one of the ways to

Â increase your odds of having above free

Â implementation is to think very

Â systematic and carefully about the

Â matrix dimensions you're working with so

Â when I'm trying to develop my own code I

Â often pull a piece of paper and just

Â think carefully through so the

Â dimensions of the matrix I'm working

Â with let's see how you could do that in

Â the next video

Â