Learn the fundamentals of digital signal processing theory and discover the myriad ways DSP makes everyday life more productive and fun.

Loading...

来自 École Polytechnique Fédérale de Lausanne 的课程

数字信号处理

377 个评分

At Coursera, you will find the best lectures in the world. Here are some of our personalized recommendations for you

Learn the fundamentals of digital signal processing theory and discover the myriad ways DSP makes everyday life more productive and fun.

从本节课中

Module 6: Digital Communication Systems - Module 7: Image Processing

- Paolo PrandoniLecturer

School of Computer and Communication Science - Martin VetterliProfessor

School of Computer and Communication Sciences

We will first define the DFT for two dimensional signals.

And then we will look at the amount of information that we can extract from

the magnitude and the phase of the DFT of an image.

Fourier analysis of two dimensional signals can be developed exactly

as we did for the one dimensional case.

Since here we're concerned mostly with digital images,

namely finite support two dimensional images.

We will only review the definition of the two dimensional DFT.

So let's consider a two dimensional finite support

signal of support big N1 times big N2.

The DFT is defined as the double sum for

the first index that goes from 0 to big N1 minus 1.

And the second index that goes from 0 to big N2 minus 1,

of the values of the dimensional signal over the support.

Times the product of two complex exponentials.

Whose frequencies are 2 pi over N1 and

1k1, and 2 pi over N2 and 2K2.

The products of these two complex exponentials represents a basis

function for the space of images of size N1 times N2.

The DFT can be easily inverted just like we did in the one dimensional case.

We take basically complex exponentials with the sign reversed and

we repeat the sum.

Taking now the DFT coefficients in to the sum.

Normalization is by convention applied to the inverse formula.

And in this case we have to divide the sum by the product of N1 times N2.

It is certainly instructive to look in more detail at the basis functions for

the space of N1 times N2 images.

These have the form that we have shown before in the DFT sum.

And we can easily prove like in the one dimensional case,

that they are orthogonal.

There are N1 times N2 basis functions for an image of that size.

And so it would be very hard to look at each one of them even for

images of moderate size.

But we can try and

plot some key representative basic functions to give you an idea.

Of what the building blocks are for an image in Fourier space.

We will show these basis functions by plotting the real part only as

a grayscale image.

Where the value of zero is indicated by black and

the value of one is indicated by white.

So here we have the space of 256 by 256 pixel images,

and this is one of the simplest basis functions that we can plot.

We keep the vertical frequency at zero, and

the horizontal frequency spans one period, between zero and 255.

So if we were to look at this in a three dimensional space, we could plot.

This is the image plane and these are the values of the basis function.

And this is a wave, a solid wave, that goes like this.

So, it goes down and then it goes up again.

And here we have the white part and here we have the black part.

We can invert the roles of the vertical and horizontal frequency.

And we get an image which is simply a 90 degree rotation of the previous one.

We can increase the horizontal frequency at this point for k1 equal to 2.

We will have that the basis function spans two periods,

Over the support of the image.

This would be three periods.

And if we swap the roles of the frequency, we obtain, again,

an image rotated by 90 degrees.

We can increase the frequency even more, and

the density of these bands will increase correspondingly.

Whenever the vertical frequency and

the horizontal frequency are the same, the bands will be angled at 45 degrees.

And by varying the frequencies,

we can obtain a wide range of different angles for the bands.

The good news is that the two dimensional DFT basis functions are separable.

And so the DFT can be computed in a separable way.

In particular, we first compute a 1D-DFT along the columns.

So if this is our image of size N1

times N2 we first compute N1 DFTs

of size N2 along the columns.

And once we're done with that, we compute 1D-DFTs along the rows.

So computationally speaking we first need to compute N1 DFTs of size N2.

We know that we can use the FFT algorithms.

So the cost will be N2 log in base 2 of N2.

And then we need to compute N2 FFTs of size N1.

So that would be N1 log in base 2 of N1.

Which is much less than N1 N2 squared,

the cost of implementing a two-dimensional DFT directly from the equation.

We can also express the 2D DFT in matrix form.

For that we need to express first of all the signal,

the two dimensional signal as a matrix.

This is very straightforward because an N1 times N2 image is

simply an N1 times N2 matrix.

There is only the technicality that the orientation of the rows is inverted with

respect to the Cartesian notations.

So for instance, in the Cartesian plane

N2 would go from, say, zero upwards.

And this is our image.

But if we express this in matrix notation,

this element of the matrix normally is 0, 0.

So there's a flipping of the vertical axis but this is just a technicality.

You will also recall the N x N DFT matrix that we saw in Module 4.2.

This is a standard DFT matrix of size N where W,

recall, is simply e to the minus j 2 pi over capital N.

With this notation in place let's look at the DFT formula once again.

The inner summation is simply

the product of the DFT matrix of size

big N2 times the signal matrix.

We call this intermediary matrix capital V and

capital V belongs to the space of N2 times N1 matrices.

Then the outer sum can be expressed as the right product

of the matrix V we just defined times the DFT matrix of size capital N1.

And so the resultant signal is a matrix that collects the DFT values for

the image.

In compact form, we can express the two-dimensional DFT as

the product of a DFT matrix of size N2.

Times the image times a DFT matrix of size N1.

So now we know how to compute a two-dimensional DFT.

Well, can we look at one?

We could try and

plot the magnitude of the DFT since the DFT is a two dimensional signal.

And therefore, it can be interpreted as an image.

But, this wouldn't work.

Because the dynamic range of the DFT is way too big for

either a monitor screen or a piece of paper to represent.

We wouldn't have enough grayscale level to show the details.

So we could try and

normalize the values by dividing the DFT magnitude by its maximum value.

And yet that wouldn't work either.

Because for images on average the distribution of the magnitude

of Fourier coefficients follows a curve like this one.

So what we have here is a few outliers here that really go above the average

values of the coefficients.

And some tail outliers here that drive the value down.

What we're interested in is this band between the extrema.

So we take a two-step approach, first we remove the flagrant outliers.

For instance the value of the DFT in 0, 0 is simply the sum of all pixel values.

Now for grayscale images where all pixels are positive values between zero and

one, say.

This will be definitely a large value with respect to any other coefficient.

And then to remove the tail, we use nonlinear mapping.

For instance, we use a curve like x to the power of 1/3

after normalizing all values between zero and one.

So for instance, if this is the range between zero and

one the nonlinearity will map this range like so.

Which means that the tail outliers that we've seen before will

be squished in a band that is very close to 0.

And so they will appear as black pixels.

And the rest of the values, those that we're interested in,

will occupy the bulk of the dynamic range of the median.

If we do that, we obtain something like this.

So our little dog has been transformed in this cryptic image here.

Where if we look very closely, we can identify for

instance some lines here and here.

That correspond to clear lines in the image captured by some

of the basis functions that have these orientations.

But by and large it's very, very difficult to understand what's going on.

We get a confirmation of our suspicion if we try

to invert the Fourier transform starting from the magnitude only.

So we take an image, we take the Fourier transform,

we discard the phase information.

And then we do an inverse Fourier transform from the magnitude.

What we get if we start with the dog picture is this picture here.

Where we are absolutely unable to guess the original picture.

What is more interesting is that if we do the same thing but

keep in just the phase, in other words we throw away the magnitude.

We set the magnitude uniformly at one but we keep the phase information and

then we do an inverse Fourier transform.

We obtain the following image.

So, with images the phase information carries most of the relevant information.

So, why is that?

The idea is that most of the semantic information in an image is contained

into its edges.

Edges are points of discontinuity in the gray level signal.

In 1D you would represent an edge as a transition from,

say a white level to a dark level.

This transition is what the eye perceives as an edge.

Now, edges therefore are very localized in space.

And they're not captured by the DFTs magnitude, which represents

the distribution in frequency of the energy of the whole image.

In order to obtain an edge from a combination of Fourier basis functions on

the other hand.

Their phase have to be aligned in a very precise manner.

And that's why the phase information captures the intelligibility

part of an image.

A consequence of this is that the global DFT of an image

is only of limited use in image processing.