0:00

In this figure, we're going to get a geometrical understanding of what happens

when a perceptron learns. To do this, we have to think in terms of a

weight space. It's a high dimensional space in which

each point corresponds to a particular setting for all the weights.

In this phase, we can represent the training cases as planes and learning

consists of trying to get the weight vector on the right side of all the

training planes. For non-mathematicians, this may be

tougher than previous material. You may have to spend quite a long time

studying the next two parts. In particular, if you're not used to

thinking about hyperplanes and high dimensional spaces, you're going to have

to learn that. To deal with hyperplanes in a

14-dimensional space, for example, what you do is you visualize a 3-dimensional

space and you say, fourteen to yourself very loudly.

Everybody does it. But remember, that when you go from

13-dimensional space to a 14-dimensional space, your creating as much extra

complexity as when you go from a 2D space to a 3D space.

14-dimensional space is very big and very complicated.

1:35

Assuming we've eliminated the threshold, we can represent every training case as a

hyperplane through the origin in weight space.

So, points in the space correspond to weight vectors and training cases

correspond to planes. And, for a particular training case, the

weights must lie on one side of that hyperplane, in order to get the answer

correct for that training case. So, let's look at a picture of it so we

can understand what's going on. Here's a picture of white space.

2:35

We're going to consider a training case in which the correct answer is one.

And for that kind of training case, the weight vector needs to be on the correct

side of the hyperplane in order to get the answer right.

It needs to be on the same side of the hyperplane as the direction in which the

training vector points. For any weight vector like the green one,

that's on that side of the hyperplane, the angle with the input vector will be less

than 90 degrees. So, the scaler product of the input vector

with a weight vector will be positive. And since we already got rid of the

threshold, that means the perceptron will give an output of what?

It'll say yes, and so we'll get the right answer.

Conversely, if we have a weight vector like the red one, that's on the wrong side

of the plane, the angle with the input vector will be more than 90 degrees, so

the scalar product of the weight vector and the input vector will be negative, and

we'll get a scalar product that is less than zero so the perceptron will say, no

or zero, and in this case, we'll get the wrong answer.

3:49

So, to summarize, on one side of the plane, all the weight vectors will get the

right answer. And on the other side of the plane, all

the possible weight vectors will get the wrong answer.

Now, let's look at a different training case, in which the correct answers are

zero. So here, we have the weight space again.

We've chosen a different input vector, of this input factor, the right answer is

zero. So again, the input case corresponds to a

plane shown by the black line. And in this case, any weight vectors will

make an angle of less than 90 degrees with the input factor, will give us a positive

scalar product, [unknown] perceptron to say yes or one, and it will get the answer

wrong conversely. And the input vector on the other side of

the plain, will have an angle of greater than 90 degrees.

And they will correctly give the answer of zero.

So, as before, the plane goes through the origin, it's perpendicular to the input

vector, and on one side of the plane, all the weight vectors are bad, and on the

other side, they're all good. Now, let's put those two training cases

together in one picture weight space. Our picture of weight space is getting a

little bit crowded. I've moved the input vector over so we

don't have all the vectors in quite the same place.

And now, you can see that there's a code of possible weight vectors.

And any weight vectors inside that cone, will get the right answer for both

training cases. Of course, there doesn't have to be any

cone like that. It could be there are no weight vectors

that get the right answers for all of the training cases.

But if there are any, they'll lie in a cone.

So, what the learning algorithm needs to do is consider the training cases one at a

time and move the weight vector around in such a way that it eventually lies in this

cone. One thing to notice is that if you get a

good weight factor, that is something that works for all the training cases, it'll

lie on the cone. Ad if you had another one, it'll lie on

the cone. And so, if you take the average of those

two weight vectors, that will also lie on the cone.

That means the problem is convex. The average of two solutions is itself a

solution. And in general in machine learning if you

can get a convex learning problem, that makes life easy.