0:00

in the last video we distract what is

the deep l-larry neural network and also

talked about the notation we use to

describe such networks in this video you

see how you can perform for propagation

in a deep network as usual let's first

go over what forward propagation will

look like for a single training example

X and then later on we'll talk about the

vectorized version where you want to

carry out forward propagation on the

entire training set at the same time but

some given a single training example X

here's how you compute the activations

of the first layer so for this first

layer you compute z1 equals W 1 times X

plus b1 so W 1 and B 1 of parameters

that affect the activations in layer 1

right where this is there one of the

neural network and then you compute the

activations for that layer to be equal

to G of Z 1 and the destination function

G depends on what layer you're at and

maybe index AB has the activation

function from there 1 so if you do that

you've now computed the activations from

layer 1

how about layer to say that layer well

you would then compute Z 2 equals W 2 A

1 plus B 2 and then so the activation of

layer 2 is the way matrix times the

output of layer 1 so is that value plus

the bias vector for layer 2 and then a2

equals the activation function apply to

z2 ok so that's it for layer 2 and so on

and so forth until you get to the output

layer that's layer 4 where you would

have that Z 4 is equal to the parameters

for that layer times the activations

from the previous layer

Plus that by this vector and then

similarly a four equals G of v4 and so

that's how you compute your estimated

output Y hat so just one thing to notice

X here is also equal to a zero because

the input feature vector X is also the

activations of layer 0 so we scratch out

X and with a cross for X and put a 0

here then you know all of these

equations they see look the same right

the general rule is that VL is equal to

WL times a of L minus 1 plus B L 1 there

and then the activations so that layer

is the activation function applied to

the values Z so that's the general for

propagation equation so we've done all

this for a single training example how

about for doing it in a vectorized way

for the whole training set at the same

time the equations look quite similar as

before for the first layer you would

have Capital Z 1 equals W 1 times

capital X plus B 1 and then a 1 equals G

of Z 1 right and bearing in mind that X

is equal to a 0 these are just you know

the training examples stacked in

different columns you could take this

let me scratch out X so you can put a 0

there and then so the next layer looks

similar

Z 2 equals W 2 A 1 plus B 2 and a 2

equals G of Z 2

we're just taking these vectors e or a

and so on and stacking them up so this

is V vector for the first training

example V vector for the second training

example and so on down to the M train

example and stacking these in columns

and calling this Capital Z all right and

similarly for capital A just as capital

X all the training examples are column

vectors smacks left to right and then

again end of this process you end up

with Y hat which is equal to G of v4 so

this is also equal to a 4 and that's the

predictions on all the view training

examples is stacked horizontally so just

to summarize our notation I'm going to

modify this up here our notation allows

us to replace lowercase Z and a with the

uppercase counterparts is that already

looks like a capital D and that gives

you the vectorized version the fourth

obligation that you carry out on the

entire training set at a time where a 0

is X now if you look at this

implementation of vectorization it looks

like that there is going to be a for

loop here right so it's left for l

equals 1 to 4 for l equals 1 through

capital L then you have to compute the

activations for layer 1 then for layer 2

then to layer 3 and then 4 therefore so

seems that there is a for loop here and

I know that when implementing your

networks we usually want to get rid of

explicit for loops but this is one place

where I don't think there's any way to

implement this over other than explicit

for loop so we're in implementing for

propagation it is perfectly OK to have a

for loop they compute the activations

for layer 1 then there are 2 then they

are threes and therefore no one knows

and I don't think there is this any way

to do this without a for loops that goes

from 1 to capital L from 1 through the

total number of layers and in your

network so this place is perfectly okay

to have an explicit form so

that's it for the notation for deep

neural networks as well as how to do for

propagation in these networks if the

pieces we've seen so far looks a little

bit familiar to you that's because what

we've seen is taking a piece very

similar to what you've seen in the

neural network with a single hidden

layer and just repeating that more times

now turns out that we implemented deep

neural network one of the ways to

increase your odds of having above free

implementation is to think very

systematic and carefully about the

matrix dimensions you're working with so

when I'm trying to develop my own code I

often pull a piece of paper and just

think carefully through so the

dimensions of the matrix I'm working

with let's see how you could do that in

the next video