0:00

So if you look at, if we break down the

components of the singular value decomposition we can take, look at

the u, the kind of the u and the v

matrix of the left singular values and the right singular values.

So you, if you look at the original data matrix here on

the left, so that's the image plot here that, that we saw before.

0:19

And then on the, in the middle plot here, I plot the, the left, the left,

the first, sorry, the first left singular vector.

And it kind of roughly shows basically the mean of all the, of that kind of data set.

So if you look, if you plot, if you plot it across the rows, you can see that the

first singular vector is kind of, has a negative value.

For rows 40 through about 18 or so or 17.

And then it has a positive value for the remaining rows.

So that shows the clear separation in the means of the two sets of of rows, right.

And if you look at the the, the, the

right sing, the first right singular vector you can see

that this also shows the change in the mean between

the the first five columns and the second five columns.

And so the nice thing

about the singular value decomposition here is that

it immediately, it immediately picked up on the shift

in the means, both kind of in the

row, from the row dimension and the column dimension.

1:27

Without you having to tell it anything.

It was totally unsupervised, and it just kind of picked it up automatically.

So remember, we made a plot that was very, very similar to this before, but that

was be, when we knew that there was this kind of pattern in the data set.

So here the, the singular value decomposition

picked up the pattern right away.

And, and, and in the first kind

of singular vet, left and right singular vectors.

1:51

Another kind of component of the singular value

decomposition is, is known as the variance explained.

And this comes from the singular values that are in the d matrix.

Remember the d matrix, is really a diagonal matrix.

It only has elements that are on the diagonal of that matrix.

And you can think of each singular value as representing the percent of

the total variation in your data

set that's explained by that particular component.

And so, and the components are typically ordered so that the first one explains

the most variation as possible, and the

second one explains the second most, et cetera.

And so you can plot the kind of proportion of

variance explained as we have in the in this plot below.

And so here on the left hand side I've just plotted the raw singular values

and you can see that they kind of decrease in value as you go across the columns.

But of course the raw singular

value doesn't really have much meaning cause it's not on an interpretable scale.

So if I divide by the total sum of

all the singular values, then on the right hand side

I've got the, I can interpret it as the

proportion of the variance explained, and you can see that's

exactly the same plot but the y axis has

changed, and so you can see that the first singular

value if you recall it captures kind of the shift

and the mean in between the rows and the columns.

That,

that captures about 40% of the variation in your data.

Alright, so almost almost half of the variation in the

data is explained by a single kind of singular value.

Or you can think of it as a single dimension.

And then, the remaining variation in the data is explained by the other components.

3:25

So, just to show that the relationship between the

singular value decomposition and the principle components is close, see,

I, I, here, I plotted, I've calculated the SVD

and the, and the PCA of the same data matrix.

And I've plotted the principle components on the, X axis, and the

for, sorry, the principle component, first principle component on the X axis.

And the first right singular val, vector on the y-axis.

And you can see that they fall exactly in a line that

they're exactly the same things.

So the svd and the principle components, and

principle components analysis is essentially the same thing.

And so if you hear, you know, you're in a cocktail party and you hear two people

talking about the svd and pca you can

rest assured that they're basically doing the same thing.

4:18

a, kind of a matrix that has either zeros or ones.

And so the idea is that there's basically

there's only one pattern in this matrix, right?

And so the first five columns are zeros.

And the second five columns are wants that's it, nothing special and so if I

plot the data you can see on the left hand side that the, left part,

the first five are red with the further kind of second five are yellowish white.

4:44

So now if I take the SVD of this, kind of,

relatively boring matrix, you can see that I can plot the

singular values and you can see that there's one singular value

that's very high and the rest are kind of zero, basically.

And what you'll see is that the first singular

value explains 100% of the variation in the data set.

So how can that be?

Well, if you think about this data set, even though there is, kind of,

40 rows andten columns, there's really only one dimension to this data set, which

is that, if you're in the first five columns, your mean is, you're equal

to 0, and if you're in the second five columns, you're equal to 1.

So even though you have lots of so-called observations, there's really only

kind of a small fraction of information that's actually useful in this matrix.

And so that's captured by the SVD.

By the fact that you can explain 100%

of the variation with a singular, a single component.

5:34

So let's add a second pattern to the data set.

So we can add a pattern that's kind of that's

kind of goes across the rows and also goes across the columns.

And so one pat, the first pattern to be,

going to become a block pattern that we saw before.

So the first half of this is going to have one

mean and one half of this is going to have another mean.

And the second

column, the second pattern, basically, is going to alternate between the columns.

So you can see what that looks like over here.

So on the left hand side that's the data and you can see that there's one pattern.

That's kind of like, that's kind of a block pattern so the

left five columns are low and the right five columns are higher.

And then you can see that, within that, kind of nested within

that, is a kind of pattern that's kind of every other column.

And so, so the, every other column is, one is higher than the other.

And so if you plot, if we, if you plot the kind

of, what the truth is, right, we can see that on the,

in the middle plot here, the first five columns have a have

a lower mean, and the second five columns have a higher mean.

6:31

And that's the first pattern, and then the second pattern shows that the first column

has a mean of 0, the second column has a mean of 1.

The third column as

a mean of 0 and the fourth column et cetera.

So there's, there's two patterns here.

One is a shift pattern and the second is

this kind of alternating pattern, so that's the truth.

But of course, we rarely know the truth, so

we need to learn the truth from the data.

So if you, the idea is if you were presented

with the data set that we've shown here on the left.

7:11

So if we run the SVD on this new matrix with the two patterns.

I got the data here on the left and the middle pan,

the middle panel here, I've got the first, the first right singular vector.

And you can see it roughly picks up the block pattern.

Right?

So the first, kind of, five are, are

somewhat lower, and the second five are somewhat higher.

Right? So it's not as pretty as the truth was.

But it's somewhat discernible, that there are two different kind of means here.

The second sorry, the second right singular

vector is on the right-hand panel here.

And you can see that, it's a little bit harder to see,

but it does try to pick up the alternating mean pattern, right?

It's not as obvious as it was when we were plotting the truth, of course.

But you can see that every other point is either higher or lower.

Now of course since we know what the truth is

it's a little bit easier to, kind of, talk about this.

8:05

But but in general you can see that the two patterns are roughly confounded

with each other because it's a little bit hard to set the two patterns apart.

So even though they're clearly in the, we

know that the truth is, is, is two separate

patterns, the, the first and the second right

singular vectors are also known as the principle components.

8:23

They kind of mix the two patterns together,

so each of them has a block pattern.

And each of them has a kind of alternating pattern.

And so unfortunately, just, just like with most real data, the truth is a

little, is a little bit hard to discern than if you'd known it in advance.

8:40

Now if you look at the variance explained in this problem

you can see that on the right-hand panel which is the

percent of the variance explained, that the first component explains over

50% of the variation, the total variation in the data set.

And that's basically because the shift pattern you know, with

the first five columns, the second five columns is so strong.

It represents a large amount of variation in the data set.

So you capture that shift

pattern, you kind of capture a lot of that variation.

You see the second component only captures about 18 or so

percent of the variation and it kind of trails off from there.

And so that's going to tell, that kind of tells you roughly,

you know, how many components are in this data set.

But that alternating pattern which kind of, which is kind of every other

column is a little bit harder to pick up, as you can see.