0:00

To extend the learning rule for a linear neuron to a learning rule we can use for

multilayer nets of nonlinear neurons, we need two steps.

First, we need to extend the learning rule to a single nonlinear neuron.

And we're going to use logistic neurons, although many other kinds of nonlinear

neurons could be used instead. We're now going to generalize the learning

rule for a linear neuron to a logistic neuron, which is a non linear neuron.

So, a logistic neuron, computes its logic, z, which is its total input, its, its bias

plus the sum over all its input lines of the value of, on an input line xi times

the weight on that line, wi. It then gives an output y that's a smooth

nonlinear function of that logit. As shown in the graph here, that function

is approximately zero when z is big and negative, approximately one when z is big

and positive, and in bet, in between, it changes smoothly and nonlinearly.

The fact that it changes continuously gives it nice derivatives, which make

learning easy. So to get the derivatives of a logistic

neuron with respect to the weight, which is what we need for learning, we first

need to compute the derivative of the logit itself, that is the total input with

respect to our weight, that's very simple. The logit is just a bias plus the sum of

all the input lines of the failure on the input lines times the weight.

So, when we differentiate with respect to wi, we just get xi.

So, the derivative of the logit with respect to wi is xi, and similarly, the

derivative of the logit with respect to xi is wi.

The derivative of the output with respect to the logic is also simple if you express

it terms of the output. So, the output is one / one + e^-z. And dy

by dz is just y into one - y. That's not obvious.

For those of you who like to see the math, I've put it on the next slide.

The math is tedious but perfectly straightforward so you can go through it

by yourself. Now, we've got the derivative, the output

with respect to the logic and the derivative, the logit with respect to the

weight, we can start to figure out the derivative, the output with respect to the

weight. We just use the chain rule again.

So, dy by dw is dz by dw times dy by dz. And dz by dw, as we just saw, is xi, dy by

dz is y into one minus y. And so, we now have the learning row for a

logistic neuron. We've got dy by dw, and all we need to do

is use the chain rule once more, and multiply it by de by dy.

And we get something that looks very like the delta rule.

So, the way the arrow changes is we change the weight, de by dwi, is just the sum of

all the row of training cases and of the value on input line xin times the

residual, the difference between the target and the output, on the actual

output of the neuron. But it's got this extra term in it, which

comes from the slope of the logistic function, which is yn into one - yn.

So, a slight modification of the delta rule gives us the gradiant decent learning

rule for training a logistic unit.