0:26

As already mentioned, the problem is MP hard and therefore MP is a greedy

algorithm that chooses one column vector at

the time from the dictionary present the data.

So that the residual error occurs the largest decrease.

0:52

An extension of the algorithm is orthogonal matching pursuit.

When the sparse representation vector selected so

far, are simultaneously recalculated at each iteration step.

MP and OMP as they're typically referred to,

are rather simple to understand, and they're widely used.

1:25

There is extensions of them that exist in the literature.

So let us now look into the algorithmic details.

Reformulations of the optimization objective function and finding surrogates

or functions or norms are two important steps in optimization.

So here's a problem we've been looking at.

We want to find a sparse solution x that satisfies this constraint.

Very often when there is noise present in

the data which is actually almost always the case.

Then we cannot guarantee that this solution will belong to this hyperplane

2:14

We followed actually similar formulations when we

talked about recovery in weeks six and seven.

We looked for example at minimizing C of x,

the L2 norm of this subjects to a noise constraint.

2:30

C goes a hypos filter therefore C of x

is the energy of the signal at high frequencies.

We can further swap the constraint and the

objective and end up with a problem like this.

So now we minimize the noise, subject to a constraint on the sparsity of the signal.

So we follow this formulation, for example, when we have a notion of how fast

the signal should be the solution is clearly a function of S in this case.

3:19

Of course the approach is the steps that will result in

solving this problem and this problem are not necessarily the same.

And actually that's part of the reason one introduces this reformulation,

so that one ends up with an easier problem to solve.

4:18

Now since the L1 norm is convex, then we

can turn this constraint optimization problem into an unconstrained one.

We can relax it by bringing the norm up to the objective, so weighted by a

parameter lambda, it's called a regulation parameter, or the Lagrange multiplier.

And lambda chosen so that the solution of

this optimization problem which is a function of

lambda should satisfy the constraints, so the L1

norm of this should be less equal S.

We have looked at relaxed problems like this, but

in weeks six and seven, when we talked about recovery.

There, for example, we solved the constraint

list squares problem, the result of this minimization.

[BLANK AUDIO] The difference of course being that here

we use the L2 norm for the stabilizing function, while here we use the L1 norm.

When it comes to considerations about lambda however, they are similar.

For example, when lambda becomes very small, goes to zero,

we are solving a least squares problem, just minimizing this norm.

While, on the other hand, when lambda becomes very large,

we end up with x equals zero as a solution.

As mentioned already, when the L2 norm is used,

then we can end up with a closed form solution.

So the solution to this problem, if you recall, it's simply equal to A

transpose A plus lambda C transpose C minus 1 A transpose b.

6:23

Before we proceed lets look at the Ax equals b representation.

It should be clear that b here can be written as the weighted sum of the

columns of A, which are also called atoms, and the weight of the entries of x.

So, in this particular example, where x is a sparse vector.

It only has these entries, 1 and 2, and the rest are 0, it

means that b is the first vector corresponding to 1 and weighted by

1 plus the seventh vector here which is weighted by 2.

So we have a vector a here and to

project b onto a, so this is orthogonal projection here.

7:37

Typically, the columns of A are normalized, so A transpose A

here or the norm of A is equal to 1 and therefore

the projection or the length of the projection along the direction of

A that's what x star is, is equal to a transpose b.

7:55

So we see that, [SOUND] the projection again, the norm of this equals

the, amplitude of x star, times the norm of a, which is equal

to 1 and therefore this is equal to, the amplitude of x star.

So if one poses the question that let's say we

have only one column of A available to represent b.

8:28

And the intuitive answer is that we should choose the column

that results in the largest projection of b onto that column.

So, for this particular example, if we only have

one column available of a to represent b then we

should a7 because the projection of b on to o7,

a7 is the largest one and it's equal to 2.

9:07

We illustrate here pictorially the matching pursuit algorithm.

It's an iterative greedy algorithm, according to which, at each

iteration step, we try to reduce the residual error the most.

And we do that by finding the largest projection

of the residual onto the columns of matrix A.

So in this particular example, we assume that A

has three columns, A1, A2, A3, as shown here.

At the first step, the residual is set equal to the observation vector b.

9:51

So x1 star, a1 is the projection, therefore

the residual becomes now b minus the projection.

So the residual here is orthogonal to a1.

The solution vector is updated and the projection, x1 star

becomes the first entry of the vector, of the solution vector.

10:27

We find that the largest projection is along the vector a3.

So, x3 star is the projection coefficient which is

updating the solution vector x, and then the residual was this

and now x3 star a3 subtracted from that so that's the new value of the residual.

10:49

We move the updated residual vector to the origin,

and keep iterating by projecting now to the remaining vectors.

We've excluded the ones that were already utilized.

We keep iterating that way, and we stop

when the constraint of the sparsity of x is

met, so if we want the sparse vector with

sparsity 8, let's say we stop after 8 iterations.

Let us describe now the orthogonal matching pursuit algorithm as we

will see, it's a simple variation of the matching pursuit algorithm.

11:47

We also utilize the set of mega which includes

all the vectors, column vectors that have already been utilized.

So clearly when we initialize, the set is empty.

So, while the sparsity constraint is not met, we find the projection of the

residual onto all the columns that do not belong to omega, cannot be utilized.

12:35

So, I look at all the vectors that are included in the set omega, that have

been utilized before, plus the new vector that was chosen in this step.

And I simultaneously update the value of

the coefficients by getting out of this minimization.

13:06

I can solve with the same algorithm, the reformulated

problem so, I minimize the norm subject to constrained

on the error, and all it needs to change

is, I need this input now epsilon instead of S.

And I iterate as long as the norm of the

residual is above the threshold, everything else remains the same.