For example, say I have a bunch of 2D data points like this.

Say they all more or less lie on a straight line right.

We can kind of see that.

These guys all more or less along the line that's going to be something like that.

I can imagine re-describing that data by mapping

them onto that line and then saying how far they are along that line.

I can map this guy down onto the line.

I can say the origin maps down

there and I could then say this data point is that far along the line.

And he's also, so he's that far along the line, there.

And he's this far away from the line.

So I've got two dimensions here.

How far I am along the line and how far I am from the line.

And these guys they're all slightly different distances from the line.

Now, it's a little bit of an argument in stats as to whether we

do the distance that way vertically,

or that way as a projection of the distance from the line.

But it's sort of a theoretical argument.

But notice that this distance from the line is

effectively a measure of how noisy this data cloud is.

If they are all tight on the line,

they'd all be very small distances away,

and if they were all quite spread,

they'd be quite big distances away.

So this distance from the line-ness,

is in fact the noise.

And that's information that isn't very useful to us.

So we might want to collapse it.

Except that that noise dimension tells me how good this line fit is.

If the best fit line was was all skewed,

was all wrong, I get a much bigger number for the noisiness.

And if the best fit line was as good as possible,

I get the minimum possible number for the noisiness.

So that noise dimension contains information that's going to tell me how good my fit is.

So when I'm doing data science,

it tells me how good my fit to my data is.

And the way I've defined these two directions along the line and away from the line,

they are orthogonal to each other.

So I can use the dot product to do the projection to

map the data from the X-Y space onto the space of the line,

along the line and away from the line,

which is what we did, what we learned to do in the last little segment.

Now if we're thinking about a neural network in

machine learning that recognises faces say,

maybe I'd want to make some transformation of

all the pixels in a face into a new basis that describes the nose shape,

the skin hue, the distance between the eyes,

those sorts of things and discard the actual pixel data.

So the goal of the learning process of

the neural network is going to be to somehow derive a set of

bases vectors that extract the most information-rich features of the faces.

So in this video we've talked about the dimensionality of a vector space in

terms of the number of independent basis factors that it has.

We found a test for independence that the set of vectors are

independent if one of them is not a linear combination of the others.

We've talked more importantly about what that means in terms of mapping from one space to

another and how that is going to be useful in data science and machine learning.