0:00

Welcome back. In this video,

Â we'll talk about how to compute derivatives for you

Â to implement gradient descent for logistic regression.

Â The key takeaways will be what you need to implement.

Â That is, the key equations you need in order to

Â implement gradient descent for logistic regression.

Â In this video, I want to do this computation using the computation graph.

Â I have to admit, using the computation graph is a little bit of

Â an overkill for deriving gradient descent for logistic regression,

Â but I want to start explaining things this

Â way to get you familiar with these ideas so that,

Â hopefully, it will make a bit more sense when we talk about full-fledged neural networks.

Â To that, let's dive into gradient descent for logistic regression.

Â To recap, we had set up logistic regression as follows,

Â your predictions, Y_hat, is defined as follows,

Â where z is that.

Â If we focus on just one example for now, then the loss,

Â or respect to that one example,

Â is defined as follows,

Â where A is the output of logistic regression,

Â and Y is the ground truth label.

Â Let's write this out as a computation graph and for this example,

Â let's say we have only two features, X1 and X2.

Â In order to compute Z,

Â we'll need to input W1,

Â W2, and B, in addition to the feature values X1, X2.

Â These things, in a computational graph,

Â get used to compute Z, which is W1,

Â X1 + W2 X2 + B,

Â rectangular box around that.

Â Then, we compute Y_hat,

Â or A = Sigma_of_Z,

Â that's the next step in the computation graph, and then, finally,

Â we compute L, AY,

Â and I won't copy the formula again.

Â In logistic regression, what we want to do is to modify the parameters,

Â W and B, in order to reduce this loss.

Â We've described the four propagation steps of how you actually

Â compute the loss on a single training example,

Â now let's talk about how you can go backwards to compute the derivatives.

Â Here's a cleaned-up version of the diagram.

Â Because what we want to do is compute derivatives with respect to this loss,

Â the first thing we want to do when going backwards is to

Â compute the derivative of this loss with respect to,

Â the script over there, with respect to this variable A.

Â So, in the code,

Â you just use DA to denote this variable.

Â It turns out that if you are familiar with calculus,

Â you could show that this ends up being -Y_over_A+1-Y_over_1-A.

Â And the way you do that is you take the formula for the loss and,

Â if you're familiar with calculus,

Â you can compute the derivative with respect to the variable,

Â lowercase A, and you get this formula.

Â But if you're not familiar with calculus, don't worry about it.

Â We'll provide the derivative form,

Â what else you need, throughout this course.

Â If you are an expert in calculus,

Â I encourage you to look up the formula for the loss from

Â their previous slide and try taking derivative with respect to A using calculus,

Â but if you don't know enough calculus to do that, don't worry about it.

Â Now, having computed this quantity of DA and

Â the derivative or your final alpha variable with respect to A,

Â you can then go backwards.

Â It turns out that you can show DZ which,

Â this is the part called variable name,

Â this is going to be the derivative of the loss,

Â versus back to Z, or for L,

Â you could really write the loss including A and Y explicitly as parameters or not, right?

Â Either type of notation is equally acceptable.

Â We can show that this is equal to A-Y.

Â Just a couple of comments only for those of you experts in calculus,

Â if you're not expert in calculus, don't worry about it.

Â But it turns out that this, DL DZ,

Â this can be expressed as DL_DA_times_DA_DZ,

Â and it turns out that DA DZ,

Â this turns out to be A_times_1-A,

Â and DL DA we have previously worked out over here,

Â if you take these two quantities, DL DA,

Â which is this term, together with DA DZ,

Â which is this term, and just take these two things and multiply them.

Â You can show that the equation simplifies to A-Y.

Â That's how you derive it,

Â and that this is really the chain rule that have briefly eluded to the form.

Â Feel free to go through that calculation yourself if you are knowledgeable in calculus,

Â but if you aren't, all you need to know is that you can compute

Â DZ as A-Y and we've already done that calculus for you.

Â Then, the final step in that computation is to go back to

Â compute how much you need to change W and B.

Â In particular, you can show that the derivative with respect to W1 and in quotes,

Â call this DW1, that this is equal to X1_times_DZ.

Â Then, similarly, DW2, which is how much you want to change W2,

Â is X2_times_DZ and B,

Â excuse me, DB is equal to DZ.

Â If you want to do gradient descent with respect to just this one example,

Â what you would do is the following;

Â you would use this formula to compute DZ,

Â and then use these formulas to compute DW1, DW2,

Â and DB, and then you perform these updates.

Â W1 gets updated as W1 minus,

Â learning rate alpha, times DW1.

Â W2 gets updated similarly,

Â and B gets set as B minus the learning rate times DB.

Â And so, this will be one step of grade with respect to a single example.

Â You see in how to compute derivatives and implement

Â gradient descent for logistic regression with respect to a single training example.

Â But training logistic regression model,

Â you have not just one training example given training sets of M training examples.

Â In the next video,

Â let's see how you can take these ideas and apply them to learning,

Â not just from one example,

Â but from an entire training set.

Â