We're going to use a very simple diagram to demonstrate and to present the model.
Again we are on the search for a model for a signal X that has N dimensions.
What is N? For example if we consider 8x8 blocks.
As in jpeg we have that N is equal to 64 why 64?
We take an 8x8 block or image patch and we take one row after another.
We concatenate them and we get N = 64. This is the signal we are looking to
model. Now.
The next step and actually one of the main components of Sparse Modelling is
the Dictionary. The Dictionary is a matrix which is N by
K. N is the dimension of the signal we are
looking to model. K is the size of the dictionary.
We have K columns. Every column is called an atom.
It's kind of an image, kind of a patch. Because every column is 64-dimensional.
And we have K such columns. Very often, k.
Is larger than n. That's not necessary, but happens very
often. For example, when we are going to do
image renouncing as we are using as an example.
If case larger than n, then we say that the dictionary is over complete.
Now, if K is equal to N, then the dictionary's complete.
For example, Fourier or discrete cosine transform are complete dictionaries.
If K is less than N, we say that a dictionary is under-complete.
Now we have the dictionary. The second element of sparse modeling is.
This vector alpha. And what we have is a vector, alpha, that
has to have dimensions, K, and we multiply the matrix, D, the dictionary,
by the vector, alpha, and we produce the signal that we're trying to model.
What's particular of this vector alpha, is the number of non zero entries of the
vector is very small. We represent those by this red dot.
We have, at most, L non zero entries. And when we multiply this matrix, the
Dictionary D by alpha. What we are doing is having a linear
combination of the atoms. Corresponding to the non-zero entries.
So, we are combining, this is the second one.
It means it's picking the second atom. This is picking its corresponding atom.
And this is picking its corresponding atom.
And we basically have a linear combination according to the coefficients
here of those L atoms. And because these are only a few non-zero
coefficients, this vector is very sparse. And from that, the concept of sparse
modeling. So what we have is basically.
A signal. Represented as a product of the
dictionary, and a vector alpha. With only a few non zero entries.
A very sparse vector. Now, this model is, first of all, very,
very simple. Once we have the dictionary, d.
And alpha will obtain the signal just by a simple product.
This dictionary, we're going to discuss a lot about it in this video, in the future
videos. But for now, it's a fixed dictionary of
dimensions N by K, as we have seen in the previous slide.
Now, the model is extremely rich. Let's just do some numbers.
If we have a dictionary of size K, meaning K atoms, and we pick L.
We have L choose out of K possibilities. So if L is three we can pick any three
out of K. Once we pick them, each one of those
selections define a subspace, a very low dimensional subspace.
Once we pick, we do that selection, the signal, it's just a linear combination of
the corresponding atoms and that's a linear subspace.
But we have many of them. We have L out of K, a lot of them.
So the model is extremely, extremely rich and, hopefully, that will help us to
represent. Basically any signal of interest.
So, the model is simple, the model is very rich.
And also, the model is actually familiar to us.
This is not the first time that we see such a model of taking basically a signal
and representing it as a linear combination of a dictionary with only a
few atoms of that dictionary. Let's think for a minute, or for a few
seconds, when have we seen already this model?
We. Remember what we have learned in the last
few weeks. We have seen this model.
In JPEG. When, or how, in JPEG D.
Is basically the cosine basis. In JPEG we took and eight*8 block.
We did a transform into the cosine domain.
So d is the cosine basis. And alpha is basically the coefficient of
the cosine transform. And we saw that, for example, when we
quantize. A lot of the, these great cosine
transform coefficients became zero. So basically alpha was 64-dimensional.
But a lot of it was zero. So it was a very, very sparse vector.
And it achieved an extremely good representation of the eight by eight
patch. With just a few provisions, so JPEG is a
great example of the power of sparsity. Let me just give you another extreme
example of how we can represent signals in a very sparse fashion.
Consider that D includes all the signals of interest.
So if you're talking about images, this has huge matrix, K basically goes over
all the possible images. If that happens, then every time you want
to represent a signal x, an image. You only go and pick the corresponding
atom. The corresponding column in d.
And that means that our sparsity is one. So this extreme example is only to
illustrate that we can do sparse representations of signals.
Of course, it's not very practical; because imagine that you cannot put all
the patches or all the images in one dictionary.
That would be huge, and it won't be very useful.
The JPEG is, as we say, probably the most successful image processing algorithm,
and it's based on Sparse Representations. So, that's what we want.
We want to Sparse Representation of sigmas.
And the basic idea once again is that the signal is a linear combination of a few
atoms because alpha is sparse. How do we measure Sparsity, is the next
question? The way we are going to measure Sparsity
is with. This, which is called the LP norm for a
given P. Remember, alpha is nothing else than a K
dimensional vector. And what we are doing here with this formula is that we take
every entry of alpha, and we take the absolute value, and we elevate to the P
power, and we add all of them. And that's the LP norm of the vector.
Let's illustrate this for different values of P.
If p is equal to two, we don't really get what we want, because what we want is to
penalize. With an equal amount, every time one of
the entries of alpha is non-zero. And when it's zero, don't penalize at
all. So, that happens.
Zero, no penalization. But then the penalization increases
quadratically with actually demanding to lose the entry.
We don't want that. We want kind of an equal penalization.
As long as it's non-zero, it gets penalized.
What happen when P is equal to one. At least the penalization now is
proportional. Directly proportional and linearly
proportional to the magnitude of the entry, is not quadratic, that's better,
but doesn't look like it is good. We are going to see, later on that this
is actually a very good penalty function. This L1.
L for P = 1 It doesn't look like. But we are going to see that, under
certain conditions, this is exactly what we are looking for.
But before we get into that, let's keep looking for the ideal case.
If now, P becomes less than one, then we start to have what we really want.
A penalty of zero if the coefficient is zero.
And then the penalty starts to flatten out, meaning.
As long as the coefficient is non-zero, we get the same penalty.
If we further decrease p less than one, I'm getting close to zero, we get exactly
what we want. Zero penalty and equal penalty for
everybody that is non-zero. So the way we are going to measure
sparsity is with this function. For P equal zero.
And that's often written. As in this fashion with a zero here, and
a zero there. This is what's called a L0 pseudonorm.
It's not really a norm, and that's why it's called the pseudonorm, although very
often, it's called also the norm, with the understanding that it's not exactly a
norm, and this is how we are going to measure sparcity.
We are going to count the number of non-zero elements in the vector alpha.
[SOUND] So this is what we basically have here.
Signal represented as a linear combination of atoms from this, that
we're not allowed to pick more than l of those atoms.
So back to our problem of image denoising.
The Maximum-A-posteriori estimation of image denoising.
What we have, is that we want to represent our signal.
Very closely, this is the measurement and this is what we want to be.
We want to have a close representation, meaning very close.
X have to be as close as possible to Y as long as it belonged to the right model
and then we replace X by the alpha. Because we know that now we're going to
be representing the signal as. The dictionary multiplied by alpha that
not every alpha we basically must have the no more than L entries in alpha are
non zero. To give a.
Illustrative representation, and of course we get back.
The signal by the alpha, where alpha is the optimal.
So, in a, in a very simple pictorial representation, what we have is D alpha,
this is D, has dimensions N by K. This is alpha, that has dimensions K.
We'll subtract Y. That's aware approximation error.
But we are not allowed to pick any alpha. We're only allowed to pick alpha that has
at most L non-zeros, which means that we are going to be approximating Y by, at
most. L columns, the linear combination of at
most L columns of D. And from now on, the representation is
given by alpha. Instead of X we use alpha and we know if
we want to recover the signal we basically multiply by D.
Now what we have obtained is a very low dimensional representation of.
Our signal, and a low dimensional approximation of y.
And the idea is that, if we reduce the dimension of y.
Basically, we are getting rid of the noise.
Because we are not allowing to use all of the items.
We're only allowing to use very few, projecting the signal into a low
dimension. And therefore, reducing the noise.
It's the same as we saw with the points. We have points.
On the plane. That if we say that we are only allowed
to represent all these points with a straight line.
With feet of straight line and we're basically projecting the points onto the
straight line and by that we are denoising the points because we go into a
much lower dimension. In this case we just go from two to one,
but that illustrates the concept that we're going to a lower dimension of space
and by that we get rid of the noise. In order to represent the noise we will
need many additional atoms and that's why we are restricting alpha to be sparse.
Now. Are we done?
We are only at the very beginning of this week.
So for sure we are not done. Let me illustrate a few of the problems
that we need to still solve. Now.
We were talking about this, minimize the error under the constraint that you are
not allowed to pick more than L non zero entries in alpha.
Is this the problem we need to solve? Maybe we can solve it in this way.
Let us constrain here the error. You don't want more than the given error
and find the sparse's possible vector that achieves this error.
Or we can actually have a combination of the sparsity and error and try to
optimize for alpha such that it minimizes the sum of the two terms with a
coefficient lambda. So, we have three possibilities here.
Sometimes the three are equivalent. Sometimes they are not.
Which one shall we be using? Another problem.
In this case, a theoretical problem. Will we always find an alpha?
So, I give you a y. I give you a dictionary.
Will I be able to find, for example, an alpha that is sparse enough, and
represents this signal up to certain error.
Is that possible? And if I find it, will that be unique?
Will there be only one alpha? Will any alpha have the same support, the
same non-zero entries? When is that possible?
And of course what about a dictionary? Which dictionary should I use in order to
get alpha as sparse as possible? This is going to be something that we are
going to discuss quite a lot during this week.
And these are some of the questions these very nice sparse modelling and sparse
representations open. So what have we done?
We basically started with image denoising as an example and we say, we need to
model the signal, we need to understand what do we mean by image denoising, what
type of clean signal are we looking for. And that basically opened a question to
say, we need to define models. When we say we need to define models, we
basically. Describe a model, which is the sparse
model. The signal is a linear combination of a
few elements. A few atoms of a given dictionary.
Once we define that, are we done? Of course no.
We have all these questions that we just mentioned, theoretical, computational,
how do I compute that alpha and, of course, how do I compute a dictionary
that is appropriate for the type of signals that we are looking for?
So these are some of the questions that we are going to address during the next
few videos. And I'm looking forward to that.
This is, as I say, a very, very exciting topic.
And very, very active topic in image processing right now.
Thank you very much. Looking forward to seeing you in the next
video.