0:17

During this week we'll cover some of the more recent results in image and

video processing.

They are based on the simple notion of sparsity.

A signal can be sparse in its native domain for example,

an image of the stars in the night sky, or it can be sparse or

compressible in some transform domain such as the DCD domain.

0:41

Our first reaction might be, what's the big deal?

We know that an image is sparse in the DCD domain from week nine already.

The big deal is that if we know that a solution to a problem is sparse,

we can then solve problems for which we did not have meaningful solutions before.

One such problem is the solution of an under-determined system of equations.

Such a system appears in numerous applications.

1:11

In the first segment of this week, we will look at applications not just from

the image and video processing domains, but those are from machine learning,

statistics, genetics, econometrics, neuroscience, and other domains.

2:16

During this week,

we will first talk about applications where sparsity plays a key role.

And these applications will be from signal and the imagine video processing.

But also from machine learning, statistics, economics and so on.

2:45

We will then discuss specific solution approaches to optimization problems where

the l0 and l1 norms are involved.

So when the l0 norm is involved, we'll talk about the matching pursuit.

This is a greedy iterative algorithm for finding a solution.

When the l1 norm is involved, we'll talk about

smooth reformulations of both the basis pursuit and the last two problems.

3:10

And finally your goal full circle, and revisit some of the applications,

well show specifically, the formulation of the problem, and

the solution approach that is followed in obtaining a solution.

3:24

What is sparsity?

As the name implies, a vector is sparse.

If it only has a few non-zero components while the rest are zero.

Now this vector can represent a signal such an image or a video,

in which case the signal can be sparse in its negative domain.

If for example we have an image of the sky at night, so

there are some bright starts only and the sky is dark.

Or it can be made sparse in another domain.

So natural images, for example in the discrete transform domain are sparse.

4:08

There's a large number of applications in which sparsity plays a key role.

Of course for the purpose of this course, image and

video processing applications are the ones we primarily are interested in.

But in the beginning we'll also talk about applications from

machine learning in general, statistics, genetics,

econometrics, neuroscience and so on and so forth.

4:50

However, the genetic basis of many leading causes of death in the world, such

as heart disease and constant diabetes are so far not clearly understood,

and this limits the use of gene therapy in this situations.

5:07

Genobite association, that is the association between

billions of genetic loci, from across the human genome.

With quantitative traits, such as cholesterol and glucose levels.

6:38

So this matrix A is a long matrix.

So for example, it can have 1,000

rows and 60,000 columns.

In solving this problem, it is desired that the solution, here,

x is a long vector, is sparse.

7:24

So this is a specific example where we're

asked to solve an a x equals b problem, and

the solution x here is desired to be sparse.

Functional magnetic resonance imaging, fMRI of the brain is becoming more and

more popular in diagnosing disorders such as ADHD,

Attention Deficit Hyperactivity Disorder.

In this application, three dimensional scans of the brain are acquired

while the subject is being asked to perform certain cognitive tasks.

Now only certain parts of the brain light up or fire up,

the ones responsible for the specific task being performed.

8:28

So you see in these examples here,

which parts of the brain fire up while the subject,

the person, is performing a rhyming task.

8:45

Now in processing such volumes,

we typically divide them into small cubes which we refer to as voxels.

So these are volume pixels or three-dimensional pixels.

So scanning the volume we can stack these voxels into vectors.

And again, here is another application in which the vectors we deal with are sparse.

So the problems we might be interested in solving are, for example,

the source localization problem that I just mentioned.

Or we might be interested in solving a classification problem,

if the person is classified as having ADHD, for example, or

not, in which case sparse logistic regression can be used.

9:48

Econometrics is allowing economists to sift through huge amounts of data and

extract simple and useful relationships.

So here we see a map of the world.

And for each country we see the GDP, gross domestic product per capita.

10:38

Then we are interested in finding an x so that Ax=b.

So we want to solve this linear system of equations

under the constraint that the solution we obtain, x here, is sparse.

Similarly to the genetics problem,

since we expect a small number of factors to have strong correlation, positive or

negative, to the GDP, while the rest of them being irrelevant or equal to zero.

Actually from studies, the factors here showed inside these

red rectangles show strong correlation, negative or positive, to the GDP.

So again, we need to know how to solve an Ax=b system of linear equations

under the constraint that x is a sparse solution.

11:31

Linear regression is a common problem in many application areas.

Given a set of points, as shown here, we want to

find a line that best describes or best fits these points.

So if we call the axes here a and b, we are given

pairs of points, a of i, b of i, such as this one.

11:58

And the model we use, the model of the line,

is that a of i x is approximately equal to b of i.

Our objective, therefore,

is to find the best possible value of x to describe this data.

A very common approach we follow is the least squares approach according to which

we minimize with respect to x the sum of the squared errors.

12:27

This minimization will result in an optimal value of x,

based on which, if a new point a is given to us,

we can form a x* which will give us the new value of b.

So if here is a new, we have

found the equation of this line after we find the x optimal, and

therefore we can use it to find the value b new.

We can write these equations in matrix vector form.

Our model takes the form Ax=b,

where in vector b we have stacked all the values b of i.

And matrix a, we stack the values of

a of i which in general is not a scalar but is a row vector.

Then the least squares objective here is to minimize with respect to x,

this norm squared, and

this is referred to as an L2 norm, as we will see later in the class.

In a number of applications,

there are sparse outliers in the data, like the ones shown here.

So these are clearly outliers, they don't look like the rest of the data, but

they're sparse.

There are just a few of them.

Otherwise this would be the data.

The question is now how well can a line describe this data

if we follow the least squares approach that we described in the previous slide?

Following a least squares approach in solving the linear regression problem, we

obtained the blue line here as the optimal line for fitting the available data.

14:19

Clearly a least squares approach towards linear regression

tends to fit through these outliers and therefore produces poor results.

We can make linear regression robust by introducing a sparse error

term that complements the linear model.

That means we are using now the model Ax + e approximately equal to b.

So this is the sparse error term.

14:47

And the objective is to find the optimal x* here to describe the data.

So we need ways now to be able to solve this problem

by taking into account the sparsity of e.

And as you'll see, we have means through the use of L0 and

L1 and the Lp norms to accomplish this.

The resulting method is referred to as

the least absolute deviations approach.

15:22

In some other applications,

we might be interested not just in robustifying the least squares problem,

as we described here, but also in finding the sparse term e as well.

15:35

So following the least absolute deviations approach with this story example here,

we obtain the red curve as the best curve to fit the data.

So this is LAD, least absolute deviations, and this is least squares.

So clearly we are doing a much better job now in describing the data and

not fitting to the outliers.

Matching consumers with the most appropriate products is key to enhancing

user satisfaction and loyalty.

16:22

The version of the problem addressed by a recommended system is described here.

It relates to the so called Netflix problem,

since it's posed by the provider of streaming movies here in the US, Netflix.

17:15

So we don't know the preference of this user to this movie, and

the retailer would like to predict this preference.

So that the retailer can decide whether to recommend or

not recommend this movie to this user.

So the retailer has an incomplete matrix.

Some of its enters are known but very many others are not known and

would like to complete this matrix.

17:47

But the way that there are thousands of movies but millions of users, so

this is really a very large matrix.

The matrix completion problem by the way appears in values other applications.

Clearly if there is not issuing information or

additional constraints, we cannot provide the meaningful solution to this problem.

The additional information that we have with the problem at hand

is that there are typically a few types of people, or

to put it differently, similar people like similar movies.

So if these two users have similar

preferences in movies then this column of the matrix

in this column will be very similar or they would be identical.

A matrix with many similar columns or groups of similar columns is of low rank.

The Matrix Completion Program therefore becomes a Rank Minimization Problem.

A low rank matrix has a number of zero singular values.

Therefore, the vector of singular values of the lower end matrix is sparse.

It is the nuclear normative matrix, which is equal to the sum of

the singular values of the matrix, but is typically minimized,

subject to the requirement that the fidelity to the data is preserved.

19:32

Here is the denoising or noise smoothing problem again.

We addressed this problem at various points during the course.

For example, we talked about enhancement during week five and

also when we talked about recovery during week six and seven.

19:53

We know from the discussion from week nine when we talked about compression that we

can transform this image to a different domain and in that different domain

only a few transform coefficients can do a good job in representing the image.

There goes for example the DCD transfer.

So if we consider a patch here of this image.

Let's call it Yi.

20:17

Then we can write that Yi = Axi.

A here represents either the basis of the DCD transform, so it's a fixed dictionary.

Or it can be designed specifically for classes of images.

In which case can be another complete, a long matrix, another complete dictionary.

21:01

So by and large, we're interested in solving problems of that nature,

so y of i is the observed noise image, A is the dictionary either

fixed or designed from the data, and we are interested in solving for x of i.

21:36

The L1 norm of this Vector with the coefficients.

As we will see, the L1 norm promotes sparsity.

Since noise is not sparse in the transform domain,

by solving this problem we are going to reconstruct the smooth patch, and

of course we can solve this problem for each and every patch of the image, or

we can combine all the patches and we can rewrite this equation term, so matrices.

22:41

We're going to recover this lost information but

the pixel intensities which I represent again here as black pixels.

Why there are a number of techniques published in the literature in providing

solutions to this problem, we want to also obtain solutions

utilizing the sparsity machinery we are discussing this week.

23:01

We will use similar ideas for the ones who used for denoising.

So if we look at the patch here of this image, will be noted by y of i,

and we'll use this model to represent y of i.

RAx of I, so A is the dictionary again,

which is either fixed or designed utilizing the data X of i is

the sparse vector that provide this approximation and

R is the mask which allows us to use this representation for

the pics of that we know but not for the pics which we do not

know that black pics has another words inside this patch.

We're going again to be able to obtain solutions for x sub i we just pass.

And we can accomplish that.

Can can update in Xi* here as the argument,

but minimizes with respect to X of i.

The L2 norm of this error squared plus lambda

the l1 norm of this representation vector.

Which again this remote sparsity.

We can utilize x star to obtain the inpainted image as Ax of i star.

So putting this machine to work, we obtain these results here.

24:45

We also mention super-resolution during week six and

seven, when we talked about recovery.

It is indeed a recovery problem.

Information is lost during the down sampling process.

We are observing an image like this one, and we want for

example, to increase its resolution by a factor of four in each dimension.

There are many techniques in the literature providing solutions to this

SR problem.

We can also provide solutions utilizing dictionaries and

the sparsing machinery we are discussing this week.

We assume we have a set of low-resolution and corresponding high-resolution images.

Utilizing this data, we train low resolution and

high resolution dictionaries simultaneously.

Relating these low resolution and high resolution patches.

The same sparse coefficient vector provides the representation of both

the low resolution and its corresponding high-resolution patch.

So if we take this patch here and denote it by H low resolution,

25:47

given this patch in the whole image you can find an x

star sparse, as we already mentioned.

To represent y low resolution as

a low resolution H star sparse.

26:09

But also y high resolution equal to A the higher resolution dictionary,

but the same representation sparse coefficient vector.

So, utilizing this approach, we observe this, and we can reconstruct this.

26:38

If we are to blow up this patch here in yellow by using pixel duplication

we get this image, while if we blow up this part here we obtain this image.

So clearly there's a lot of additional resolution that has been recovered due

to this method we discussed here.

27:21

Using this sparsity machinery we're discussing this week,

we can also obtain a solution to this problem.

We put all the observed data into one matrix and

then decompose this matrix into a low-rank matrix.

As were the case in matrix completion.

And the sparse matrix.

28:05

So, we see that we can obtain a very good separation of the foreground,

background, using the machinery sparsity that we will cover this week.

Who provides the details of the solution to this problem in the later lectures.

28:58

Indicating which entry in the data base this B corresponds to.

So x can have just one entry equal to one if there's a perfect match of

this query image into the database.

Or the query image can be a combination of images in the database where

x will have more than one entries, but nevertheless it is sparse.

29:23

In both these cases we have abnormalities such as a scratch or occlusion.

Therefore the model we end up working with is that B is approximately equal

to Ax plus e, so this is an error term that takes the scratches or

the occlusion into account where also this vector e is sparse.

29:48

Therefore we need to be able to solve problems of this nature whereby

we just constrain again both x and

e to be sparse solutions when we try to solve for

x, and therefore we try to identify a person that exists in the database.

30:26

We talked about Nyquist's theorem earlier in the course according to which I need to

sample with something frequencies

at least twice as high as the highest frequency in the signal.

In the image you're talking about special frequencies in the two dimensions.

According to compressive sampling,

we can reconstruct the original image with very little or

no error if we acquire fewer samples than those dictated by Nyquist's theorem.

The sampling, however, now is not done with deltas, where we go and

pick out the value of each and every pixel.

But by projecting the image onto random bases.

Projection means that we form the inner product of the image with a random matrix,

and this inner product will give me a scalar which represents the sample.

Then, we end up with an under-determined system of equations,

which we can solve efficiently,

as we'll see later in the class, while informing sparsity.

And the image is sparse either in its original domain, or in a transform domain.

31:50

It is almost indistinguishable from the original image.

Utilizing only 25% of the sample,

so N squared over 4, we see this reconstruction.

And if we go down all the way to 10%,

N squared over 10, we see the reconstruction shown here.

So we're going to talk about this application later in

this course with considerably more detail.