0:00

when you implement back-propagation for

Â your neural network you need to really

Â compute the slope or the derivative of

Â the activation functions so let's take a

Â look at our choices of activation

Â functions and how you can compute the

Â slope of these functions you can see

Â familiar sigmoid activation function and

Â so for any given value of z maybe this

Â value of z this function will have some

Â slope or some derivative corresponding

Â to if you draw a rule line there you

Â know the height over with there's no

Â triangle here so if G of Z is the

Â sigmoid function then the slope of the

Â function is e VZ G of Z and so we know

Â from calculators that the disss slope of

Â G of X and V and if you are familiar

Â with calculus and know how to take

Â derivatives if you take the derivative

Â of the sigmoid function it is possible

Â to show that it is equal to this formula

Â and again I'm not going to do the

Â calculus steps but if you're familiar

Â with calculus do for you to pause the

Â video and try to prove this yourself and

Â so this is equal to just G of Z times 1

Â minus G of Z so let's just sanity check

Â that this expression makes sense

Â first if Z is very large so say Z is

Â equal to 10 then G of Z will be close to

Â 1 and so the formula we have on the Left

Â tells us that D DZ G of Z does be close

Â to G of Z which is equal to 1 times 1

Â minus 1 which is therefore very close to

Â 0 and this isn't d correct because when

Â Z is very large the slope is close to 0

Â conversely of Z is equal to minus 10 so

Â there's no way out there then G of Z is

Â close to 0 so the following on the left

Â tells us d DZ G of Z will be close to G

Â of Z which is 0 times 1 line is 0 and so

Â this is also very close to 0 which is

Â correct

Â finally at Z is equal to 0

Â then G of Z is equal to 1/2 that's a

Â sigmoid function right here and so the

Â derivative is on equal to 1/2 times 1

Â minus 1/2 which is equal to 1/4 and that

Â actually is turns out to be the correct

Â value of the derivative or the slope of

Â this function when Z is equal to 0

Â finally just to introduce one more piece

Â of notation sometimes instead of writing

Â this thing the shorthand for the

Â derivative is G prime of Z so G prime of

Â Z in calculus the the little dash on top

Â is called prime but so G prime of Z is a

Â shorthand for the in calculus for the

Â derivative of the function of G with

Â respect to the input variable Z um and

Â then in a new network we have a equals G

Â of Z right equals this then this formula

Â also simplifies to a times 1 minus a so

Â sometimes the implementation you might

Â see something like G prime of Z equals a

Â times 1 minus a and that just refers to

Â you know the observation that G prime

Â which is used derivative is equal to

Â this over here and the advantage of this

Â formula is that if you've already

Â computed the value for a then by using

Â this expression you can very quickly

Â compute the value for the slope for G

Â prime s alright so that was the sigmoid

Â activation function let's now look at

Â the 10h activation function similar to

Â what we had previously the definition of

Â d DZ G of Z is the slope of G of Z as a

Â particular point of Z and if you look at

Â the formula for the hyperbolic tangent

Â function and if you know calculus you

Â can take derivatives and show that this

Â simplifies to this formula

Â 4:19

and using the own shorthand we have

Â previously when we call this G prime of

Â Z you gain so if you want you can sanity

Â check that this formula make sense so

Â for example if Z is equal to 10 10 H of

Â Z will be very close to 1 this goes from

Â plus 1 to minus 1 and then G prime of Z

Â according to this formula will be about

Â 1 minus 1 squared so during 3 closes

Â zero so that was a Z is very large the

Â slope is close to zero conversely a Z is

Â very small say Z is equal to minus 10

Â then 10 H of Z will be close to minus 1

Â and so G prime of Z will be close to 1

Â minus negative 1 squared so it's close

Â to 1 minus 1 which is also close to 0

Â and finally is equal to 0 then 10 H of Z

Â is equal to 0 and then the slope is

Â actually equal to 1 which is which

Â selection the slope point on Z is equal

Â to 0 so just to summarize if a is equal

Â to G of Z so if a is equal to this

Â challenge of Z then the derivative G

Â prime of Z is equal to 1 minus a squared

Â so once again if you've already computed

Â the value of a you can use this formula

Â to very quickly compute the derivative

Â as well finally here's how you compute

Â the derivatives for the value and

Â looky-loo activation functions for the

Â value g of z is equal to max of 0 comma

Â Z so the derivative is equal to you

Â turns out to be 0 if Z is less than 0

Â and 1 if Z is greater than 0 and is

Â actually our undefined

Â technically undefined the G is equal to

Â exactly 0 but um if you're implementing

Â this in software it might not be a

Â hundred percent mathematically correct

Â but the work just fine if you V is

Â exactly really zero if you set the

Â derivative to be equal to 1 or this had

Â to be 0 it kind of doesn't matter if

Â you're an excellent

Â nation technically G prime then becomes

Â what's called a sub gradient of the

Â activation function G of Z which is why

Â gradient descent still works but you can

Â think of it as that the chance of Z

Â being you know zero point exactly those

Â or those of those or zero is so small

Â that it almost doesn't matter what you

Â set the derivative to be equal to when Z

Â is equal to zero so in practice this is

Â what people implement for the derivative

Â of Z and finally if you are training on

Â your own network with the we here a Lou

Â activation function then G of Z is going

Â to be max of say 0.01 Z comma Z and so G

Â prime of Z is equal to 0.01 that Z is

Â less than 0 and 1 if Z is greater than 0

Â and once again the gradient is

Â technically not defined when Z is

Â exactly equal to 0 but some maybe

Â implement a piece of code that sets the

Â derivative or the assess G prime Z

Â either a 0.01 or to 1 either way it

Â doesn't really matter when Z is exactly

Â 0 your co-workers so arms of these

Â formulas you should either compute the

Â slopes or the derivatives of your

Â activation assumptions now we have this

Â building blocks you're ready to see how

Â to implement gradient descent for your

Â neural network let's go onto the next

Â video to see that

Â