In the last video you learned about gradient checking.

In this video, I want to share with you some practical tips or

some notes on how to actually go about implementing this for your neural network.

First, don't use grad check in training, only to debug.

So what I mean is that, computing d theta approx i, for

all the values of i, this is a very slow computation.

So to implement gradient descent, you'd use backprop to compute d theta and

just use backprop to compute the derivative.

And it's only when you're debugging that you would compute this

to make sure it's close to d theta.

But once you've done that, then you would turn off the grad check, and

don't run this during every iteration of gradient descent,

because that's just much too slow.

Second, if an algorithm fails grad check, look at the components,

look at the individual components, and try to identify the bug.

So what I mean by that is if d theta approx is very far from d theta,

what I would do is look at the different values of i to see which are the values of

d theta approx that are really very different than the values of d theta.

So for example, if you find that the values of theta or d theta,

they're very far off, all correspond to dbl for some layer or for

some layers, but the components for dw are quite close, right?

Remember, different components of theta correspond to different components

of b and w.

When you find this is the case, then maybe you find that the bug is in how

you're computing db, the derivative with respect to parameters b.

And similarly, vice versa, if you find that the values that are very far,

the values from d theta approx that are very far from d theta,

you find all those components came from dw or from dw in a certain layer,

then that might help you hone in on the location of the bug.

This doesn't always let you identify the bug right away, but

sometimes it helps you give you some guesses about where to track down the bug.