And so what this means is that even after say one greater descent update,

you're going to update, say, this first blue rate was learning rate times this,

and you're gonna update the second blue rate with some learning rate times this.

And what this means is that even after one created the descent update, those two blue

rates, those two blue color parameters will end up the same as each other.

So there'll be some nonzero value, but this value would equal to that value.

And similarly,

even after one gradient descent update, this value would equal to that value.

There'll still be some non-zero values,

just that the two red values are equal to each other.

And similarly, the two green ways.

Well, they'll both change values, but

they'll both end up with the same value as each other.

So after each update, the parameters corresponding to the inputs going into

each of the two hidden units are identical.

That's just saying that the two green weights are still the same, the two red

weights are still the same, the two blue weights are still the same, and what that

means is that even after one iteration of say, gradient descent and descent.

You find that your two headed units are still computing exactly the same

functions of the inputs.

You still have the a1(2) = a2(2).

And so you're back to this case.

And as you keep running greater descent, the blue waves,, the two blue waves,

will stay the same as each other.

The two red waves will stay the same as each other and

the two green waves will stay the same as each other.