0:22

In order to restrict the set of possible solutions, we regularize the problems.

Prior knowledge is used, to constrain the solution space.

When this prior knowledge, we bring into the problem is that,

the solution is sparse, then sparse promoting norms need to be used.

0:42

By visualizing the regularized solution of the problem Ax equals b, we see that

the L0, L1, and lp norms, where p is less than one, are promoting sparsity.

1:41

Let us begin by considering a

Linear Inverse Problem encountered in many applications.

We're given the system A x equals b, where

A and b are known and we try to solve for x, so it is indeed an inverse problem.

I use a color coded representation here where green represents known quantities,

so this is a known matrix, this is a known vector, while

red represents, an unknown quantity at the unknown vector x in this particular case.

2:51

Let us consider the case now, when we have an over determined, system of equations.

So the number of equations is greater than the number of

unknowns and therefore, the matrix a here is a tall matrix.

Let's say it's m by n where m is, greater than n.

3:10

In this case, one solution we can obtain, is the Least Square solution.

It is obtained by minimizing the norm of the arrow, Ax minus b.

So this is the L2 norm squared and the minimization is with respect to x.

This is actually a minimization, we carried out

back in week six when we talked about restoration.

4:08

brackets, we should obtain x transpose, A transpose Ax minus 2, x

transpose A transpose

b, plus b transpose b, equal to 0.

And if we obtain the gradient, this term will give us

2, A transpose Ax, and this term will give us minus 2.

A transpose b is clearly independent of x, so this is equal to zero.

6:17

According to the regularization principle, we bring prior

knowledge into the solution process, by defining a function on j of x.

So we minimize this with respect to x, subject

to the constraint, that the solution x satisfies the data.

This is the fidelity to the data constraint.

6:41

There's a problem that people have looked at for many years

in Linear Algebra and Optimization, and what's new here is that.

One type of prime, or knowledge, that we can bring

into the problem, is that the solution x is sparse.

[BLANK_AUDIO]

In this case, powerful results have been shown in the literature.

And the question is, what type of functionals here

j of x, can we use that promotes sparsity?

So this is what we'll discuss next,

we'll look at norms that indeed promote sparsity.

7:19

We're interested in looking at various forms of

the regularized problem, I described in the previous slide.

We'll, we'll introduce functional j of x which depend

on values norms and we'll start with the L2 norm.

So, the L2 norm of a vector is defined here, it's also called the Euclidean norm.

I'm sure that you've encountered this many times, in the past.

So for example, if we are given a vector with elements 3,

4, 0, a three by one vector, then its L2 norm is simply

equal to the square root of 3 squared plus 4 squared plus 0

squared, so the square root of 25, which is equal to 5.

[BLANK_AUDIO]

Let's look at how the unit bowl, looks, with respect to the L2 norm.

So, let's look in two dimensions, it is easier to visualize.

So, let's look at, the gaze where the, L2 norm squared of the vector is equal to 1.

And in two dimensions, this means that x1 squared plus x two squared, equals to 1.

So this is clearly the equation, of a circle in two

D, or a hydrosphere if we are in the undimensional space.

So with, our axis here, x1 and x2.

This is the, unit circle, according to the

L2 norm, so the radius here is equal to 1.

We are interested in solving this Constraint Optimization problem.

We want to minimize the L2 norm of the vector, subject

to the constraint of the vector, the solution satisfies, the data.

9:19

If we, visualize this problem in two D, then this equation here,

defines a hyper plane in the undimensional space or a line in two dimensions.

So this is my constraint, and then to obtain the solution, we

take here the unit ball and, enlarge it

until it intersects, it touches the line which means that I have

a solution with the smallest x that satisfies, the constraint.

So, we use a larger circle here, and yet

a larger one and at this point, clearly they touch.

So, this point here is the solution to my original optimization problem.

10:09

The main observation here is, that, in

two-dimensions, the circle and the, line are

going to touch most probably at the point, but will give us a non-sparse solution.

So this solution here is an X1 component, which is this one.

10:28

And an x2 component, which is equal to this one.

A sparse solution would be, if we were to select

this point or this point, or these, any of these four.

For this particular problem actually, we can find the, solution in closed form.

It's given by this expression which is an expression I'm going to derive next.

10:57

One of the advantages of the use of the L2 norm is

that, very often closed form expressions

result from, the optimization problems like

the one we are discussing, so, let's look here how we can

indeed obtain in a, very few easy steps the solution in this case.

So, we're doing to get the problem of minimizing with respect to

x the L2 norm of x, subject to the constraint that Ax equals b.

[BLANK_AUDIO]

The first on the step, is to

turn the constraint problem into an unconstrained one.

So, we minimize, with respect to x,

12:04

to x of L of x.

The so called KKT conditions, for obtaining an optimum are that the gradient

with respect to x of L of x, should be equal to 0 and the gradient of L of x

with respect to Lambda should be equal to 0.

From the first KKT condition, we obtain

here in a rather straightforward way, following

steps that we first discussed during week six and seven, when we covered recovery.

We see that the gradient of the L2 of x, is equal to x,

12:52

transpose A transpose lambda, so the gradient of this is, A transpose Lambda.

This is equal to 0 therefore, x is equal to minus A transpose Lambda.

From the say, the second KKT condition was

simply obtained that Ax minus b is equal to 0.

So if I say, substitute x in here, I obtain minus A, A transpose

Lambda equals b.

And assuming that A A transposes invertible,

Lambda is equal to minus A A transpose minus 1 b.

And then substituting this, in the first

equation, we'll plane that x equals the minuses,

cancel out A transpose AA transpose minus 1 b.

14:04

So, this is the, close formed solutions.

We can call it, x star here, of

the original optimization problem when the L2 norm is used.

So, the advantage of the L2, is that we have a closed form expression.

The disadvantage with respect to our purposes, is

that we do not obtain a sparse solution.

14:41

It is defined as shown here, it is a counting norm.

It counts the number of non zero entries in x.

You may or may have not seen this norm before.

It's actually not a, a norm, in the

sense that it does not satisfy the scaling property.

14:59

Alpha x L0 norm is different than alpha the L0 norm of x.

So, if we use the example I used earlier, and I have the vector 3, 4, 0.

Then the, L0 norm of this vector, is simply equal to 2.

Since there are only two non-zero elements.

15:24

How does the union ball look like when the L0 norm is used, the L0 ball.

If I look again at two dimensions, my two axes, x

1, x 2.

All the elements on the axis except from the origin,

belong to the unit ball utilizing the L0 norm,

because if I look at an element here, it has norm 1.

Another element here has also norm1, an element here has norm 1.

An element here has norm 1 and so on.

It's, I assume, quite clear to see indeed that, the unit

ball are the two axis with the exception of the origin.

16:34

The point of intersection of a straight line

with the L0 ball here, are solutions to the

problem and we see indeed that, in this

particular example, in two dimensions, I have two solutions.

16:50

Both solutions are, indeed sparse, right?

If we look at this solution, let's assume that this value is here minus 5.

So this is minus 5 0, so it's a sparse solution,

Iif we look at this one, assume this is a 2.

So it's a 0, two base solution.

17:14

So, with the use of the L0 norm, we do obtain sparse solutions.

The solution, however, is not unique as

was visualized here with this simple example.

17:30

Let us now look at the L1 norm,

which you will be utilizing in our optimization problem.

It is defined as shown here, so it's the sum

of the absolute value of the element of the vector.

So for the vector used before as an example, 3 4 0, Clearly, the

L1 norm of this vector is 3 plus 4 equals 7.

By the way, if you recall the L2 norm of this,

is equal to 5 and the L0 norm is equal to 2.

18:54

So, let us look at the solution, of the optimization problem shown here.

So we use the L1 norm now, and we want to.

Constrained x by L1, we can find, we want to find the smallest

L1 normal vex, subject to this constraint that the solution satisfies the data.

19:27

In visualizing the solution, we show Ax equals b

line into b and then we take the L1 ball and we keep

increasing its size or nflate it until it hits this line.

So here is, the L1 ball and here its larger size that is going

20:20

So, through this visualization, it should be clear that when the L1 norm is used in

an optimization problem like this, then the resulting solution is sparse.

We're going to have an intersection of a plane or

hyper plane in general with these rumbles in, and dimensions.

And of course there is a small chance that, the straight line here is along

the lines of the side of the, rhombus, but this is a very low probability event.

21:03

Let's look at the use of the lp norms now, where p

is less than one in solving

the regularized problem that we are considering.

Here's a definition of the lp norm.

21:57

I'm interested in using the lp norm to solve this optimization

problem, so the function of j x that I defined a

couple slides earlier now is just the Lp norm of x,

we want to get the smallest possible one subject to this constraint.

In visualizing this solution, as we did before, we showed the hyper

plane Ax equals b or the straight line into b, that's the constraint.

And then we consider the lp bowl.

We keep inflating it, until it touches the constraint.

And, this is then the point that provides

a solution to the optimization problem we're interested in.

22:51

Therefore solving problems that do not include either convex functions or convex

sets is not as straightforward as when when convexity is present.

So we are going to say a few things about convexity.

Regarding convexity this is how when one dimension of convex function looks like.

If I take any two points of the graph of the function

23:21

Here's a picture of a non convex function, which

clearly does not satisfy the property I just mentioned.

I can consider these two points and here is

the line that does not lie above the graph.

23:44

On to the set and this is a non-convex set that does not have this

property,by large, we are interested in solving convex optimization

problems which means that both the function, we minimize is

convex and the set of points that define this function is also a convex set.

24:08

For such optimization problems, we can obtain a global, minimum, such as this

one, or global minima, if we have an f of x, that looks like this.

24:36

We show a synthetic experiment here which compares the effectiveness of

the L2 norm and the L1 norm towards providing sparse results.

It provides actually the, experimental evidence that the L2 norm enforces

smallness of the solution while the L1 norm enforces, promotes sparse.

25:30

So, given b and given a, we solve the

problem of minimizing, the normal vex,

subject to the constraint that Ax equals b.

So now, if the L2 norm is used, that is the solution we obtain.

25:54

So this is the estimated x here when the L2 norm is utilized.

As argued, for the values of x are

rather small, however, the solution is not sparse.

26:10

If the L1 norm is utilized, over here, and

we solve again this problem, the, optimal solution we obtain, is this one.

It is actually identical to the original signal x.

[BLANK_AUDIO]

We show here in one graph, all the

unit balls utilizing the values, norms that we introduced.

So, the circle is the L2 ball, by the way

all these points are equal to1 minus 1 minus 1.

26:46

The two axes with the exception of the origin, are,

represent the L0 unit ball, the red curve represent the LP.

The LP is between 0 and 1.

Ball and the diamond represents the L1 norm, as

we discussed, the, the use of the L2 norm results

in a convex optimization problem and more than that,

in many instances, we can obtain the closed form solution.

27:20

So, these are the advantages of the use of the L2 norm.

However, we do not obtain sparse solutions,

but instead, small values for the solution.

27:31

What we're really after in obtaining sparse

solutions, is the use of the L0 norm.

However in this case, we end up with an NP, Non-Polynomial hard problem.

The use of the lp norm, results in a non-convex

optimization problem, although the solutions are indeed sparse.

27:57

While the use of the L1 norm, results in a Convex Optimization problem, although

the function is not smooth, is not,

differential [UNKNOWN] points and the solution is sparse.

So, we seeing some sense here all the possible engineering

trade-offs, we have kind of lazy talk-time solutions, but is non-sparse, the

hard solution to obtain, also hard since it is non-convex and therefore in some

sense the use of the L1 norm represents the available choice.

And indeed it has been utilized in solving sparse problems in many

different formulations, as we will see later in the class.

And over here in summary, a comparison of the

L0 and L1 norms in solving these constrained optimization problems.

Minimize x with respect to L0, subject to this constraint or minimizing x, utilizing

the L1 norm subject to the same constraint and this is the basis pursuit problem.

29:15

However, the L0 norm models sparsity directly.

Just counts the number of non zero elements, while the L1 does so indirectly.

L0 is non-convex, The L1

is, and is actually the tightest convex surrogate to the L0 norm.

29:35

The overall problem is NP hard with the

L0 norm, while here it's convex but non smooth.

In solving the L0 problem, there are greedy approaches such as the matching

pursuit that we're going to discuss in

this lecture, that provide an approximate solution.

While solving the Basis Pursuit problem, that are

Convex Optimization algorithms we're also going to discuss.