0:00

in the last video you saw what a single

hidden layer neural network looks like

in this video let's go through the

details of exactly how this neural

network computers outputs what you see

is that is like logistic regression the

repeater of all the times let's take a

look so this is what's a two layer

neural network

let's go more deeply into exactly what

this neural network

compute now was said before that

logistic regression the circle images

the regression really represents two

steps of computation first you compute Z

as follows and in second you compute the

activation as a sigmoid function of Z so

a neural network just does this a lot

more times let's start by focusing on

just one of the nodes in the hidden

there and this look at the first node in

the hidden layer so I've grayed out the

other nodes for now so similar to

logistic regression on the left this

node in a hidden layer does two steps of

computation right the first step and

think it's as the left half of this node

it computes Z equals W transpose X plus

B and the notation we'll use is um these

are all quantities associated with the

first hidden layer so that's why we have

a bunch of square brackets there and

this is the first node in the hidden

layer so that's why we have the

subscript one over there so first it

does that and then a second step is it

computes a 1 1 equals sigmoid of z11

like so so for both Z and ay the

notational convention is that a Li the L

here in superscript square brackets

refers to layer number and the I

subscript here refers to the nodes in

that layer so then they will be looking

at is layer 1 that is a hidden layer

node 1 so that's why the superscript and

subscript were on both 1 1 so that

little circle that first node in your

network represents carrying out these

two steps of computation now let's look

at the second node in your network the

second node in the hidden layer of in

your network similar to the logistic

regression unit on the left this little

circle represents two steps of

computation the first step is a

confusing Z this is still layer 1

pronounced the second node equals W

transpose X plus V

- and then a 1/2 equals Sigma z12 and

again feel free to pause the video if

you want but you can double check that

the superscript and subscript notation

is consistent with what we have written

here above in purple so we'll talk

through the first two hidden units in

the neural network on hidden units three

and four also represents some

computations so now let me take this

pair of equations and this pair of

equations and let's copy them to the

moon fly so here's our network and

here's the first and there's a second

equations they were worked on previously

for the first and the second hidden

units if you then go through and write

out the corresponding equations for the

third and fourth hidden units you get

the following and those make sure this

notation is clear this is the vector W 1

1 this is a vector transpose x I think

so that's what the superscript G there

represents this is a vector transpose

now as you might have guessed if you're

actually implementing in your network

doing this with a for loop seems really

inefficient so what we're going to do is

take these four equations and vectorize

so I'm going to start by showing how to

compute Z as a vector it turns out you

could do it as follows

let me take these WS and stack them into

a matrix then you have W 1 1 transpose

so that's a row vector of the column

vector transpose gives you a row vector

then W 1 2 transpose W 1 3 transpose of

V 1 4 transpose and so this by stacking

goes from for W vectors together you end

up with a matrix so another way to think

of this is that we have for logistic

regression unions there and each of the

logistic regression unions have a

corresponding parameter vector W and by

stacking those four vectors together you

end up with this four by three matrix so

if you then take this matrix and

multiply it by your input features x1 x2

x3 you end up with by our matrix

multiplication works you end up with w1

1 transpose x w1 w2 1 transpose X of U 3

1 transyl

XW 1 transpose X and then let's not

forget the bees so we now add to this a

vector

b11 b12 b13 in 1/4 so that they see this

then this is b11 b12 b13 e 1/4 and so

you see that each of the four rows of

this outcome correspond exactly to each

of these four rows of each these four

quantities that we had above so in other

words we've just shown that this thing

is therefore equal to V 1 1 V 1 to V 1 V

V 1 4 right as defined here and maybe

not surprisingly we're going to call

this whole thing the vector V 1 which is

taken by stacking up these um

individuals of these into a column

vector when we're vectorizing one of the

rules of thumb that might help you

navigate this is that when we have

different nodes in a layer or stack them

vertically so that's why when you have Z

1 1 2 Z 1 for those correspond to four

different nodes in the hidden layer and

so we stack these four numbers

vertically to form the vectors V 1 and

reduce one more piece of notation this 4

by 3 matrix here which we obtained by

stacking the lower case you know W 1 1 W

1 2 and so on we're going to call this

matrix W capital 1 and similarly this

vector or going to call B superscript 1

square bracket and so this is a 4 by 1

vector so now we've computed Z using

this vector matrix notation the last

thing we need to do is also compute

these values of a and so probably won't

surprise you to see that we're going to

define a 1 as just stacking together

those activation values a11 to a14 so

just take these 4 values and stack them

together in a vector called a1 and this

is going to be sigmoid of z1 where

there's no husband implementation of the

sigmoid function that takes in the four

elements of Z and applies the sigmoid

function element wise to it so just a

recap we figured out that z1 is equal to

W 1 times the vector X plus the vector B

1 and a 1

is sigmoid x z 1 let's just copy this to

the next slide and what we see is that

for the first layer of the neural

network given an input X we have that z1

is equal to w1 times X plus B 1 and a 1

is sick point of z1 and the dimensions

of this are 4 by 1 equals this is a 4 by

3 matrix times a 3 by 1 vector plus a 4

by 1 vector B and this is 4 by 1 same

dimensions and remember that we said X

is equal to a 0 right just like Y hat is

also equal to a 2 so if you want you can

actually take this X and replace it with

a 0 since a 0 is if you want as an alias

for the vector of input futures X now

through a similar derivation you can

figure out that the representation for

the next layer can also be written

similarly where what the output layer

does is it has associated with it so the

parameters W 2 and B 2 so W 2 in this

case is going to be a 1 by 4 matrix and

B 2 is just a real number as 1 by 1 and

so V 2 is going to be a real numbers

right as a 1 by 1 matrix is going to be

a 1 by 4 thing times a was 4 by 1 plus B

2 is 1 by 1 and so this gives you just a

real number and if you think of this

loss output unit as just being analogous

to logistic regression which had

parameters W and B on W really plays in

nablus role to W 2 transpose or W 2's

really W transpose and B is equal to B 2

right similar to you know cover up the

left of this network and ignore all that

for now then this is just this last

output unit there's a lot like logistic

regression except that instead of

writing the parameters as WMV we're

writing them as W 2 and B 2 with

dimensions one by four and one by one so

just a recap for logistic regression to

implement the output or the influence

prediction you compute Z equals W

transpose X plus B and a y hat equals a

equals sigmoid of z

when you have a new network who have one

fit in there what you need to implement

two computers output is just the four

equation and you can think of this as a

vectorized implementation of computing

the output of first these four

logistical russian units and hitting

there that's what this does and then

this which is regression in the output

layer which is what this does

I hope this description made sense but

takeaway is to compute the output of

this neural network all you need is

those four lines of code so now you've

seen how given a single input feature

vector at you can with four lines of

code compute the outputs of this viewer

Network um similar to what we did for

logistic regression will also want to

vectorize across multiple training

examples and we'll see that by stacking

up training examples in different colors

in the matrix or just slight

modification to this you also similar to

what you saw in which is regression be

able to compute the output of this

neural network not just on one example

at a time belong your say your

anti-trade set at a time so let's see

the details of that in the next video