0:00

Hello, and welcome back. We are now ready for last fundamental

step in Sparse Modeling that this dictionary learning.

Let's go into that. Remember what we are trying to solve.

We are solving this equation, and this is the dictionary.

This is the data, and we already talked a lot about alpha, the Sparse

Representation of the data using the dictionary.

So what should the dictionary be? The assumption in image processing, the

assumption is that images are behave in a sparse fashion.

And we have already seen examples of that using, for example, JPEG or even these

exaggerated example of putting all the images as part of the dictionary.

So how do we design a dictionary? The call is if we believe in sparsity we

should decide a dictionary that encourages sparsity, the more sparse, the

vector alpha. The better we are going to be because we

are trying to create sparse models. So the idea is that we are going to have

to learn, or we are going to select the dictionary that helps us to sparsify the

signal. And there are basically two directions in

doing that. One is to take.

Off the shelf dictionaries. There are very good dictionaries out

there. We have seen JPEG is an outstanding

algorithm. So we could use the cosine as a

dictionary. We could use Fourier, we can use

Wavelength. There are many, many off the shelf

dictionaries. The issue with those dictionaries at.

They're not adapted to the signal. So they will be, kind of, universal.

They will be very good for the signals that we are going to work with, but maybe

they're not, kind of, the best possible. So the other direction is basically to

learn the dictionary. Let's start with a lot of examples of

data. And let us learn the dictionary.

And then we can for example, use that dictionary for other images.

And that's what we're going to discuss now.

We're going to be in the dictionary learning regime and not in this regime

which is the most classical signal processing.

And we were here when we were doing. Image compression and we describe JPEG.

But now it's our turn to describe this. To present a dictionary that will help us

to sparsify the signal. So we are going to design.

D. And the basic idea is very, very simple

in concept, and also not very difficult to implement, as we're going to see.

So now, instead of having one signal, we have multiple signals.

Each one of them is a column here. So each column here is for example an

image patch which is 64-dimensional, eight by eight.

We have one dictionary that we are going to train for all these images.

And then every one of these images, patches, or signals have, has its own

Sparse Representation. So for example the first one has this

one. The second one has the next one.

So we have moved from a single column here and a single column here to a

matrix. What's the dimensions of these matrix.

It's N. So that's, again, the size that we had

before. And the number of signals, P.

What is D? Again, we have N and K.

So, N here and K in destination. There is nothing new here.

And here, we have K in this direction And p in this direction, one per signal.

And the goal now is to design this dictionary.

What do we want? We want basically a Sparse Representation

for all the signals. So we are basically summing over p all

the sigmas. This is P.

Okay? And we have one dictionary for all the

signals and we want a Sparse Representation for each one so, see that

J runs from one to P, a unique dictionary and all of them have to be good

representations. And they have to be sparse at the same

time. Before that, we didn't have this

summation and we were just talking about one.

Now we have a summation and we want a dictionary that is good for all of them.

Now, let me just make one comment. I'm going to describe how to do that when

we have already picked signals and we are going to learn the dictionary.

Now, as the number of signals increases, we can do a similar type of training of

learning that I am going to show you next online.

Online means. That we are basically.

going to learn and adapt the dictionary as the images are coming.

So we don't have to have a huge memory to save all the images.

We're going to do that as the signals are coming.

I'm not going to describe that during this week.

The algorithms. The implementations that I mentioned that

you could download from the web. Do that in an online fashion so they can

deal with millions zillions of images with absolutely no problem.

So, now my goal is to show you how we do this learning.

And look at here. We have to optimize, not only for the

sparse code as we were doing before, we're going to optimize also for the

dictionary. So we're going to learn the dictionary,

and the Sparse Representation of the signals, at the same time.

And the basic idea. There are a number of techniques.

I'm going to explain you what's called the K-SVD Algorithm that is kind of an

extension of K means for clustering. The basic idea is very, very simple to

illustrate. We have all the signals.

I wrote them as x here. And we're going to learn a dictionary.

So we initialize the dictionary, any way you want.

One way of initializing is just picking, randomly, some of these data points.

So you pick a few data points and you put as columns here.

Remember that I mentioned, here is k. We're going to learn K atoms.

So you just. Here, there are P signals, and P is

larger than K. Otherwise, as we talked before, just put

the signals in the dictionary. So just randomly pick, here, K of the

signals and put them as dictionary. That gives us a, an initialization.

Now, the next step is we're going to basically do Sparse Coding with that

dictionary. That we already know how to do.

So we basically go every signal and we solve as Sparse Coding problem.

This is going to be the code for the first signal, the code for the second

signal. Again, red means active, is part of the

active set, and non-zero co-efficient. So we did the sparse coding.

We went all around and we basically for each one of these, of them did Sparse

Coding. It's one of these blocks I'm going to

explain a bit more in the next slide. I'm just giving you the general idea.

The next part is we're going to have to activate the dictionary.

And the basic idea is we go in the other direction.

For doing this Sparse Coding study, we went into this direction we sparsely code

every signal. In the dictionary, we're going to to

update one item at a time. Using the codes that we have already

produced. So we are going to update one atom at a

time. Now after we have updated, we are going

to let the signals to be encoded again and reiterate, so we.

Start with a dictionary, we do Sparse Coding.

We update the dictionary, we do Sparse Coding again, we update the dictionary

and either we converge or we do that for a prefixed number of interrations

depending on our computational capabilities.

Now, what I have to do is explain each one of these blocks, and you're going to

see how simple they are. This we already know, basically.

This is Sparse Coding. Sparse Coding is very, very simple

because we have discussed a number of times.

And look like I put here, P. Because either we do zero with basically

matching pursuit or orthogonal matching pursuit, or we do the relaxation, P equal

one. So given the dictionary, we do the Sparse

Coding. We are going to go in this direction for

every signal we get the Sparse Code. Okay?

So we go in that direction. This is just one example.

And for every signal, we have to basically solve a Sparse Coding problem

for that signal, and absolutely nothing else.

And that's, again, is here. So we don't have the sum.

Because we do one at, at time. So, we don't have to compute the whole

matrix A, this is the whole matrix A. We don't compute it all at once, we just

go one at a time as we have been doing before and that basically solves us, the

whole matrix A. So, now we have this Sparse Code with

this dictionary. Now it's time to update the dictionary,

which is the new part. And again, we'll solve this by any

Pursuit algorithm as I mentioned before. Either the L0 with Matching Pursuit, or

L1 with any of the convex solvers. So now it's time to do the dictionary.

Let us pick one of the adults. As I say we're going to update one atom

at a time. How do we update this atom?

The concept is very simple before I run you through.

We're going to basically pick all the signals that have used that atom.

This signal is using it. This signal is using it, this signal is

using, and so on. So we basically are going to say, which

one are the signals have nonzero entry in the alpha vector corresponding to that

atom. And we're going to make that atom even

better for those syncs. So, basically we go with peak one.

And we look at all the signals that are using it.

Okay? Just that are using that atom.

We have one here, another here, another here, another here, and so on.

Remember, this is our data. And this is our matrix, this Sparse Code.

So we take all of them, and we forget about everybody else.

We say, you signals didn't use this atom, so I'm not going to pay attention to you

at this moment. I'm going to pay attention to you later

on. Remember, we iterated this.

And now the goal is, as I say, we're going to make this atom even better.

Remember please, this atom has been used by these signals.

But that's not the only atom that those signals use.

So for example, the first one uses this and this.

The second one uses this and this. So, I'm keeping all the signals that have

used that atom but they have also used other atoms but I, I'm not going to touch

that. The next thing I do is actually I remove

the influence of dots. Let me just explain that once again.

For example the first signal was a combination of this atom, and this atom.

So I go and such drug the contribution of this atom.

The same for the second signal. I go and subtract the contribution of

this atom. The same for the third, the same for the

fourth. So, for every signal, I basically go and

subtract the contribution of all the other atoms, but the one that I'm trying

to change. And then I ge, I have an error.

Because I'm not considering, now, this atom.

I'm subtracting the contribution of everybody else.

And I just keep. That contribution of this atom, and that

basically gives us kind of an error. Okay.

And that's the error we're going to try to redesign the atom to minimize that

error, that error. What's the contribution of this atom when

we use it for those signals, lets see if we can change the atom to make that

contribution even better, meaning that error even smaller.

And so what we have here is this the error, this is the residual, this is what

was the contribution of this atom. And now I want to change it, I want to

redesign it and also I want to redesign the coefficient.

So, this I have. This is the amount of information energy

that this atom was contributing. And now I'm going to basically.

Try to redesign it to make it even better.

So this is what we have to optimize for and this is very easy.

It's what's called a singular value decomposition.

It's a standard tool in linear algebra. It has a clause formula.

So, once again, we pick an atom, we can go one atom at a time or we can go

randomly, we pick an atom, we pick all the signals that have used that atom.

We subtract the contribution of every other atom.

And then we say, how can I improve this atom in such a way that it's contribution

to the signal is even better? And that's an SVD.

So, we run an SVD for every single of the atoms.

We have updated the dictionary. And now, as we explained in one of the

previous slides, we let, again, all these thing has to be encoded again with a new

dictionary. Then we update the dictionary and we run

a few iterations. So very simple, Sparse Coding, SBD.

Sparse Coding, SBD. That's all what we have to do.

Multiple of those. As many Sparse Codes as signals.

As many SVDs are, as atoms in the dictionary.

We do that, we have a new dictionary. So, what we have so far is, we started

from the need of doing signal modeling. That basically led us to propose Sparse

Modeling as a technique. We had to discuss some theory.

Why is this possible? And then we had to discuss, how do we do

the Sparse Coding, and how do we train the dictionary?

And that was the topic of the last two presentations.

Now is, will this work? Let us try with real images.

And let's see if this works. Let me just pay attention to one point

before we go into that in order to close this video about dictionary learning.

Some people have basically. Decided that instead of training the

dictionary, as we have just said, we're going to make the dictionary be a subset.