And so they're a measure of how much would we like to change the neural network's

weights, in order to affect these intermediate values of the computation.

So as to affect the final output of the neural network h(x) and

therefore affect the overall cost.

In case this lost part of this partial derivative intuition,

in case that doesn't make sense.

Don't worry about the rest of this,

we can do without really talking about partial derivatives.

But let's look in more detail about what backpropagation is doing.

For the output layer, the first set's this delta term,

delta (4) 1, as y (i) if we're doing forward propagation and

back propagation on this training example i.

That says y(i) minus a(4)1.

So this is really the error, right?

It's the difference between the actual value of y minus what was

the value predicted, and so we're gonna compute delta(4)1 like so.

Next we're gonna do, propagate these values backwards.

I'll explain this in a second, and end up computing the delta terms for

the previous layer.

We're gonna end up with delta(3)1.

Delta(3)2.

And then we're gonna propagate this further backward,

and end up computing delta(2)1 and delta(2)2.

Now the backpropagation calculation is a lot like

running the forward propagation algorithm, but doing it backwards.

So here's what I mean.

Let's look at how we end up with this value of delta(2)2.

So we have delta(2)2.

And similar to forward propagation, let me label a couple of the weights.

So this weight, which I'm going to draw in cyan.

Let's say that weight is theta(2)1 2,

and this one down here when we highlight this in red.

That is going to be let's say theta(2) of 2 2.

So if we look at how delta(2)2,

is computed, how it's computed with this note.

It turns out that what we're going to do, is gonna take this value and

multiply it by this weight, and add it to this value multiplied by that weight.

So it's really a weighted sum of these delta values,

weighted by the corresponding edge strength.

So completely, let me fill this in, this delta(2)2 is going to be equal to,

Theta(2)1 2 is that magenta lay times delta(3)1.

Plus, and the thing I had in red,

that's theta (2)2 times delta (3)2.

So it's really literally this red wave times this value,

plus this magenta weight times this value.

And that's how we wind up with that value of delta.

And just as another example, let's look at this value.

How do we get that value?

Well it's a similar process.

If this weight, which I'm gonna highlight in green,

if this weight is equal to, say, delta (3) 1 2.

Then we have that delta (3) 2 is going to be equal to that green weight,

theta (3) 12 times delta (4) 1.

And by the way, so far I've been writing the delta values only for

the hidden units, but excluding the bias units.

Depending on how you define the backpropagation algorithm, or

depending on how you implement it, you know, you may end up implementing

something that computes delta values for these bias units as well.

The bias units always output the value of plus one, and they are just what they are,

and there's no way for us to change the value.

And so, depending on your implementation of back prop,

the way I usually implement it.

I do end up computing these delta values, but

we just discard them, we don't use them.

Because they don't end up being part of the calculation needed to

compute a derivative.

So hopefully that gives you a little better intuition

about what back propegation is doing.

In case of all of this still seems sort of magical,

sort of black box, in a later video, in the putting it together video, I'll try to

get a little bit more intuition about what backpropagation is doing.

But unfortunately this is a difficult algorithm to try to visualize and

understand what it is really doing.

But fortunately I've been,

I guess many people have been using very successfully for many years.

And if you implement the algorithm you can have a very effective learning algorithm.

Even though the inner workings of exactly how it works can be harder to visualize.