0:00

when you implement back-propagation for

your neural network you need to really

compute the slope or the derivative of

the activation functions so let's take a

look at our choices of activation

functions and how you can compute the

slope of these functions you can see

familiar sigmoid activation function and

so for any given value of z maybe this

value of z this function will have some

slope or some derivative corresponding

to if you draw a rule line there you

know the height over with there's no

triangle here so if G of Z is the

sigmoid function then the slope of the

function is e VZ G of Z and so we know

from calculators that the disss slope of

G of X and V and if you are familiar

with calculus and know how to take

derivatives if you take the derivative

of the sigmoid function it is possible

to show that it is equal to this formula

and again I'm not going to do the

calculus steps but if you're familiar

with calculus do for you to pause the

video and try to prove this yourself and

so this is equal to just G of Z times 1

minus G of Z so let's just sanity check

that this expression makes sense

first if Z is very large so say Z is

equal to 10 then G of Z will be close to

1 and so the formula we have on the Left

tells us that D DZ G of Z does be close

to G of Z which is equal to 1 times 1

minus 1 which is therefore very close to

0 and this isn't d correct because when

Z is very large the slope is close to 0

conversely of Z is equal to minus 10 so

there's no way out there then G of Z is

close to 0 so the following on the left

tells us d DZ G of Z will be close to G

of Z which is 0 times 1 line is 0 and so

this is also very close to 0 which is

correct

finally at Z is equal to 0

then G of Z is equal to 1/2 that's a

sigmoid function right here and so the

derivative is on equal to 1/2 times 1

minus 1/2 which is equal to 1/4 and that

actually is turns out to be the correct

value of the derivative or the slope of

this function when Z is equal to 0

finally just to introduce one more piece

of notation sometimes instead of writing

this thing the shorthand for the

derivative is G prime of Z so G prime of

Z in calculus the the little dash on top

is called prime but so G prime of Z is a

shorthand for the in calculus for the

derivative of the function of G with

respect to the input variable Z um and

then in a new network we have a equals G

of Z right equals this then this formula

also simplifies to a times 1 minus a so

sometimes the implementation you might

see something like G prime of Z equals a

times 1 minus a and that just refers to

you know the observation that G prime

which is used derivative is equal to

this over here and the advantage of this

formula is that if you've already

computed the value for a then by using

this expression you can very quickly

compute the value for the slope for G

prime s alright so that was the sigmoid

activation function let's now look at

the 10h activation function similar to

what we had previously the definition of

d DZ G of Z is the slope of G of Z as a

particular point of Z and if you look at

the formula for the hyperbolic tangent

function and if you know calculus you

can take derivatives and show that this

simplifies to this formula

4:19

and using the own shorthand we have

previously when we call this G prime of

Z you gain so if you want you can sanity

check that this formula make sense so

for example if Z is equal to 10 10 H of

Z will be very close to 1 this goes from

plus 1 to minus 1 and then G prime of Z

according to this formula will be about

1 minus 1 squared so during 3 closes

zero so that was a Z is very large the

slope is close to zero conversely a Z is

very small say Z is equal to minus 10

then 10 H of Z will be close to minus 1

and so G prime of Z will be close to 1

minus negative 1 squared so it's close

to 1 minus 1 which is also close to 0

and finally is equal to 0 then 10 H of Z

is equal to 0 and then the slope is

actually equal to 1 which is which

selection the slope point on Z is equal

to 0 so just to summarize if a is equal

to G of Z so if a is equal to this

challenge of Z then the derivative G

prime of Z is equal to 1 minus a squared

so once again if you've already computed

the value of a you can use this formula

to very quickly compute the derivative

as well finally here's how you compute

the derivatives for the value and

looky-loo activation functions for the

value g of z is equal to max of 0 comma

Z so the derivative is equal to you

turns out to be 0 if Z is less than 0

and 1 if Z is greater than 0 and is

actually our undefined

technically undefined the G is equal to

exactly 0 but um if you're implementing

this in software it might not be a

hundred percent mathematically correct

but the work just fine if you V is

exactly really zero if you set the

derivative to be equal to 1 or this had

to be 0 it kind of doesn't matter if

you're an excellent

nation technically G prime then becomes

what's called a sub gradient of the

activation function G of Z which is why

gradient descent still works but you can

think of it as that the chance of Z

being you know zero point exactly those

or those of those or zero is so small

that it almost doesn't matter what you

set the derivative to be equal to when Z

is equal to zero so in practice this is

what people implement for the derivative

of Z and finally if you are training on

your own network with the we here a Lou

activation function then G of Z is going

to be max of say 0.01 Z comma Z and so G

prime of Z is equal to 0.01 that Z is

less than 0 and 1 if Z is greater than 0

and once again the gradient is

technically not defined when Z is

exactly equal to 0 but some maybe

implement a piece of code that sets the

derivative or the assess G prime Z

either a 0.01 or to 1 either way it

doesn't really matter when Z is exactly

0 your co-workers so arms of these

formulas you should either compute the

slopes or the derivatives of your

activation assumptions now we have this

building blocks you're ready to see how

to implement gradient descent for your

neural network let's go onto the next

video to see that