It's going to be really useful,

if we can make a transformation matrix whose column vectors,

make up a new basis,

all of whose component vectors are perpendicular,

or what's called orthogonal to each other.

In this video, we're going to look at how we do this,

so why it's useful?

First, I want to define a new operation on

a matrix that we haven't seen before called transpose.

This is where we interchange all of the elements of the rows and columns of our matrix A.

If I've got a matrix A with elements i j,

I'm going to say the transpose of it,

is where I interchange all of those i's and j's.

So, if I have a matrix say 1, 2, 3, 4,

if A is that,

then A transposed, will be where I interchange the elements on all the rows and columns.

So, the 1,1 if I interchange it will stay where it is,

it'll stay being one, and the four will stay too.

The ones on the leading diagonal stay where they are,

but when I do this,

I flip the elements that are not on the leading diagonal.

So, the two and the three will swap there.

So, that's the transpose of A.

Now let's imagine I have a matrix A here,

whose an n by n matrix,

whose got a series of column vectors,

which are going to be the basis vectors of the new transformed vector space.

And there, each of which is going to be,

so Ai1, Ai2,

all the way to Ain because I've got n of those column vectors,

each of which has n rows.

So, that's if you like a1, a2,

all the way to an those vectors.

And I'm going to say two things about these vectors.

First, they have unit length.

So, they have little hats,

and they're orthogonal to each other.

That is, a1 dot a2,

is equal to nought.

In fact, ai dot aj, is equal to nought,

if i isn't j,

and if they're orthogonal to each other they'll

have unit length that will be equal to one,

if i equals j.

Now, let's say I pre-multiply A, by its transpose,

what I'll then have is a series of row vectors of all of those individual a hats,

but transposed, pushed over, into being rows.

So, this will be a1 hat, as a row,

that will be a2 hat,

all the way down to an hat.

Now what happens when I multiply A transposed by A?

Well, what I'll get, is I'll get here another matrix which will also be square ATA,

and I've got that row times that column being the first element,

but that's just the dot product of a1 hat with itself, that's one.

So, I get a one here.

The next element a1 hat by a2 hat, well, that's zero.

And I'm going to get that all the way across.

If I do a2 hat times a1 hat, that's

also going to be zero because of it's the dot product of two orthogonal vectors.

If I want to do a2 hat with a2 hat, it's going to be one,

because they are both unit vectors, they are the same.

So, a matrix dotted by itself gives me one,

and so on and so on.

And what I'm going to build up,

is I'm going to build up the identity matrix,

where I've got zeros all the way down,

zeros all the way down there,

and so I've got zeros all the way,

and I'm going to just have the identity matrix.

So, what I've found here,

is that in the case where A is

composed of vectors that are normal to each other and have unit length,

when they're orthonormal, that's what that's called,

orthonormal, let's write that guy down,

orthonormal, that's what that means,

they're all of unit length and they are all perpendicular to each other.

Then in that case,

A transpose times A is the identity.

So, A transpose, is actually the inverse of A, that's really fun.

So, in the case where I have an orthonormal basis vector set composing A,

then the transpose is the inverse.

So, I don't have to go to all the faff of finding

the inverse and doing all of that hard work,

I could just write it down immediately just by doing the transpose at a moment,

and this is then A,

in that case, is then what's called an orthogonal matrix.

One thing also to know about an orthogonal matrix,

is that because all the basis vectors are of unit length here,

it must scale space by a factor of one,

it doesn't make the space any bigger or smaller.

So, the determinant of an orthogonal matrix must be either plus or minus one.

So, we can write that down,

the determinant of A, is equal to plus or minus one.

The minus one arises if the new basis vector set flip space around,

that is if they make it left handed,

from the right handed it was originally.

Notice that if AT,

A transpose is the inverse,

I should also be able to multiply A by AT and get the inverse.

So, I could pre- or post- multiply and still get this,

the identity.

And that means by the same logic that the rows of

the orthogonal matrix are also orthonormal to each other,

and we saw in the last video that actually the inverse is

the matrix that does the reverse transformation.

So, the transpose matrix of an orthogonal basis set,

is itself another orthogonal basis set, which is really neat.

Now, remember that in the last module on vectors,

we said that transforming a vector onto a new coordinate system,

was just taking the projection or dot product

of that vector onto each of the new bases vectors,

as long as they were orthogonal to each other.

So, if I've got a vector,

let's call him r there,

and I want to project him into a new set of axes,

let's call that one being,

some e1 and that one being e2.

If these are orthogonal to each other,

then I can project into the new vector space we said,

just by taking the dot product of r with e2,

and the dot product of r with e1,

and then we'd have its components in the new set of axis.

If you want to pause and think about that for a moment,

in light of all that we've learned here about

matrices where we can do it all in one go,

then just pause and look at this just for a moment.

Now, in data science,

really what we're saying here is that wherever possible

we want to use an orthonormal basis vector set,

when we transform our data.

That means that A,

the transformation matrix will be an orthogonal matrix and therefore,

the transpose will be the inverse,

it will be really easy to compute.

It means that the transformation will be reversible

because space doesn't get collapse by any dimensions,

it means the projections are just the dot products,

lots of things are nice and lovely.

And if I arrange the bases vectors in the correct order,

then the determinant will be one.

An easy way to check if they aren't in the right order,

is you calculate the determinant and if it is minus one,

then you've transformed from right to left handed,

just exchange a pair of the vectors and then you will get determinant of one in fact,

and you'll still have a right handed set, it will still be a little bit nicer and easier.

So, what we've done in this video,

is look at the transpose and that has led us to

find out about the most convenient basis vector set of all,

the orthonormal bases vector set which together might be orthogonal matrix,

whose inverse is its transpose.