Learn the fundamentals of digital signal processing theory and discover the myriad ways DSP makes everyday life more productive and fun.

Loading...

来自 École Polytechnique Fédérale de Lausanne 的课程

数字信号处理

284 个评分

Learn the fundamentals of digital signal processing theory and discover the myriad ways DSP makes everyday life more productive and fun.

从本节课中

Module 6: Digital Communication Systems - Module 7: Image Processing

- Paolo PrandoniLecturer

School of Computer and Communication Science - Martin VetterliProfessor

School of Computer and Communication Sciences

We will talk about image manipulations and

in particular we will consider a class of transformations called affine transforms.

And we will see how to implement affine transforms for

the class of digital images using the bilinear interpolation method.

An affine transform is a mapping that reshapes a coordinate system.

In our case we're talking about images, so

we're interested in the mapping from R2 to R2.

An affine transform is defined in terms of a two by two matrix and

a translation vector.

And the new coordinate pair is given by the multiplication of the matrix times

the original coordinate pair minus the translation vector.

In compact form, we will express it like so.

Let's look at some examples of affine transform,

a very simple one is translation.

The matrix here is the identity matrix and

the translation vector specifies the new origin of the plane.

So if we have this simple image here and

we apply a translation, we would get this simple result.

Scaling is an affine transform where the matrix is a diagonal matrix,

and the diagonal entries are the amount of stretching or

contraction of each axis independently.

In this example, for instance, we have a scaling where clearly a1 is equal to a2,

and both are less than 1, because the resulting image is larger.

Rotation is another affine transform, we have seen this before.

The structure of the matrix is like so.

We have cosine of theta on the diagonal and

sine of theta and minus sine of theta on the anti-diagonal.

A pure rotation leaves the origin in place and, as an example,

here is a rotation by an angle theta that is clearly larger than pi over two.

And here, for instance, you have a rotation by pi.

Flips can be both horizontal or vertical and

they simply swap the direction of an axis.

In both cases, the matrix is diagonal.

And in the case of a horizontal flip,

the first element of the diagonal would be -1 and the second one is 1.

Vice versa for a vertical flip, the second access gets flipped, and so

the second element of the diagonal would be -1.

Here in the image we see horizontal flip.

Shear is a stretching of the plane in either the horizontal or

vertical direction.

For this case of affine transform, the matrix is either upper triangular or

lower triangular.

You have 1s on the main diagonal, and

according to whether the shear is horizontal or vertical,

you have a parameter s on the upper corner or on the lower corner.

The effect of shearing is like so.

In this case you have a horizontal shearing,

where the horizontal axis is stretched.

If we now try to apply an affine transform to a digital image,

we run into a fundamental problem.

Remember, a digital image lives in discreet space.

In other words, pixel coordinates are pairs of integers, they live in Z2.

The affine transform on the other hand,

maps this original coordinates onto a point in R2, which

means that the new pair of coordinates can lie anywhere on the plane and,

in particular, it will most likely lie in between individual pixel coordinates.

To understand the situation better, if this is the regular grid of pixels in Z2,

and this is the result of the affine transformation, we would

like that each point here on this grid be mapped to a point on this grid here.

But unfortunately the affine transform will probably map this point

here in between.

So how do we solve this problem?

The first step is to consider the inverse transform.

So instead of going from the original image to the destination image,

here we draw it again, we start from the destination image, for

which we know we want points to lie on the grid.

And we compute the originating point from the original images.

So if this is the original image,

this is a transformed image, we take each point on the grid and

we find the point it would come from with the inverse transform.

Now just like before, this point will not lie on the original grid.

But here's the second step.

This pair of coordinates that we found with the inverse transform,

we will express as two integer coordinate, eta1 and

eta2, plus two correction factors, tau1 and tau2,

where tau1 and tau2 are strictly less than 1.

So it is a sub-pixel correction factor.

So again if this is a magnified view of the original grid,

our transform gives us a point here.

We will find that this point t1, t2 is equal to a point on the grid eta1,

eta2, plus a correctional factor tau1 here, and a correction factor tau2.

So the question now is how to find

this sub-pixel value here in the middle of the grid.

An the idea is to interpolate from the four neighboring pixels.

When the interpolation scheme between neighboring pixel is the linear

interpolator, this strategy goes under the name of bilinear interpolation.

So let's see how this works.

We have our reference point x(eta1, eta2) and

we also have the three neighboring points, like so.

Remember, our sub-pixel value, as computed before, falls here in the middle,

and it is expressed as a pixel at horizontal distance of tau1,

and a vertical distance of tau2 from the reference coordinate eta1 and eta2.

So to compute its value, we first interpolate in the horizontal direction,

and we find two sub-pixels values at vertical coordinate eta2 and

eta2 + 1, both at a distance of tau1 from eta1.

And then once we have these two values, we compute a vertical interpolation

at a distance of tau2, like so, and this gives us our desired value.