0:02

We describe in this segment motion

estimation techniques which rely on the computation

of both spatial gradients within a

video frame, and temporal gradients across frames.

0:14

They represent in some sense the confirmation that in order to be

able to estimate the motion of an object across frames we need

both strong spacial edges, that will define the object as well as

temporal edges, that will indicate that there is indeed motion in the scene.

0:44

We derive the optical flow equation, and discuss various forms for solving it.

We see that we need the assumption that the

pixels in the neighborhood are undergoing the same motion.

An important assumption that we also made

when we discussed block or region matching algorithms.

1:03

We investigate two forms of the algorithm.

The non-recursive, and the recursive version.

With the second approach, in principle no motion vectors need to be

transmitted from the encoder to the decoder in a video compression scenario.

But instead, the coder would regenerate the

motion vectors that were estimated by the encoder.

1:42

Although all the methods we have been discussing so

far are based on the estimation of the optical flow,

we look here at another approach to estimate it,

that is traditionally referred to as the optical flow equation.

We start with the same assumption as before.

That is, the brightness of a pixel or an object, it

remains the same in going from one frame to another frame.

So if I observe a certain intensity of a pixel at the current frame, at time

tau, I expect to find the same intensity at the reference frame, at time zero.

Translated, however by u in the one dimension, and v in the other dimension.

2:57

The objective clearly is to estimate the translational

components, u and v.

So the steps I take are to expand the intensity

at the current frame around the point x y in the reference frame.

And here I show that Taylor series expansion using only

the first order terms, while omitting here the higher order terms.

3:31

Since this two terms are equal, due to the

constant brightness constraint, they cancel out, and therefore what I'm left

with is this equation, where I have substituted the partial

derivative in the x direction by Ix, the partial in the y

direction by Iy and the temporal partial by It.

If I divide everything by tau, then I end up with this form of the optical

flow equation, where now V of x and Vy denote veloc, velocities.

4:24

And the objective, again, is to solve for Vx now and Vy.

I have two unknowns and one equation.

And therefore I have infinitely many solutions in general.

So I need to take an additional step to

be able to find a meaning, a meaningful solution.

4:42

In order to avoid the problem I just mentioned,

which is having two unknowns and only one equation.

I do a similar thing to what we did

when we talked about the block matching, original matching.

That is, we assume that there is a neighborhood of pixels.

That they all undergo the same motion.

So the pixel location is indicated here by q1, q2, qn.

So little n pixels in this neighborhood.

Then for each pixel location I write optical flow equation.

So this is the spatial delivery at the x direction at the pixel location q1.

Spacial derivative of the intense derivatives in

the y direction again at pixel q1.

And this is the temporal derivative at pixel location q1.

So I do this same thing.

These are the optical flow equations for the location

q 2, all the way to the location q n.

If n is equal 2, then I have two equations in two unknowns.

And if n is greater than 2, I have an undetermined system of equations.

Typically I choose n greater than 2 so that I accommodate

noise issues and I get the more robust solution that way.

I can write this system of equations in matrix

vector form, that is I can stuck here.

The spatial index of q1, spatial index

direction of q2, spatial index direction of

qn, spatial in the y direction q1, y direction

q2, y direction qn, so this is the matrix.

It's an n by 2 matrix that will multiply

the unknowns vx, vy, and this is

equal to the temple derivatives,

q1, q2, all the way to qn.

8:00

So for example, one solution to this

problem is the minimum norm least squares solution.

Also called normal equations if A transpose A is an invertible matrix

then I can find my solution, the displacement vector field as x

equals A transpose A inverse, A transpose b.

If not invertible I can then use the generalized inverse.

8:32

If A transpose A is not invertible we add this

term here, lambda C transposed C, this is referred to as a regularized solution.

And the net result is that now this whole matrix is invertible.

[SOUND] So therefore this a solution.

And another way to look at it is

that through this regularization we pose a smoothness constraint.

On the vector filter on the solution we are after.

This is, if this is not clear at this point don't worry we are going

to spend quite some time analyzing this constrained

square solution when we talk about image restoration.

9:19

I'd like also to mention here that if we are in a flat region and try to

estimate the motion, then these spatial derivatives are going

to be zero, as well as these temporal derivatives.

So in flat regions the matrix A and the vector b are 0, very close to 0,

and therefore it's hard to find a solution, any

solution would satisfy the A x equals b equation.

Therefore it's very important when you do motion estimation that edges, are present

and edges means that the special derivatives

are definitely not going to be 0.

And this is the same comment can be

made actually when we talked about block matching.

In matching two perfectly flat regions that

can be matched everywhere they are seen.

10:23

They also use the optical flow equation and therefore,

spatial temporal derivatives are computed as you'll see, right away.

The notation we use here in this light

is slightly different than the one I used before.

The main characteristic of this algorithm is that they have a prediction

and a correction part of the displacement vector field that I'm after.

10:47

So start again with a constant brightness constraint,

that is, the ambient lighting has not changed.

And therefore a pixel of a given intensity

can be found in both frames under consideration.

So at time t, I'm considering a pixel here at this location r.

The intensity of this pixel is i r t.

And I tried to find the location of this pixel, at the previous frame.

The frame at times t minus 1.

And the location of this pixel is going to be at r minus d as shown here.

11:24

So the pixel denoted by x at fry, at frame t can

be found at frame t minus 1, it's also denoted by x.

And it's this vector d that I'm interested in finding, because it will, again,

tell me exactly where this pixel can be found at, at the previous frame.

I will denote

12:06

We assume that we have an initial estimate of, of

the displacement field which we'll denote by d of i.

And this is shown here.

So somebody gave us this initial estimate, and of

course we'll discuss how we can find this initial estimate.

And then, since I have d of i the objective now is to find the correction.

The difference between the true d and my

initial estimate, which I denote by u here.

Because if I know u, I can go from my initial estimate, I can correct.

It is a correction term, u, that will allow me

to go from my initial estimate to the true solution d.

[BLANK_AUDIO]

So we have converted the initial problem of solving for d

here, to the problem of solving for the correction term u,

that will correct my initial estimate d of i, and bring

me again to the true displacement which we denote here by d.

13:09

We form now the displaced frame difference delta, which is a function of r.

This is the point at which you are interested in finding the motion vector,

and u, this correction term, based on the initial estimate of the motion d of i.

So the displaced frame difference is the

temporal derivative between the intensity at this point,

13:32

and the point over here, which is denoted like this.

And it's based on the initial estimate of the motion vector d of i.

So the displaced frame difference again, is the temporal derivative along

the motion trajectory, based on the initial estimate of the motion.

We perform now a taylor series expansion of this point

we just indicated around this point, which is indicated like this.

14:00

Here the first order terms and the high order terms are modeled as an arrow term.

Now since u is smaller than d, based

on a good initial estimate of the motion vector,

then by keeping just the first order of time

here, I provide the better approximation to this intensity.

As compared to the case we covered when we discussed about the

optical flow, that the Taylor series expansion was done with respect to d.

14:55

We end up therefore with this form of the optical flow equation.

Here is again is the temporal derivative along the predicted motion trajectory.

These are the special derivatives and this is again the arrow term.

So as discussed previously when you're deriving the optical

flow equations I have here one equation and two unknowns.

15:20

And therefore I need to use additional equations in order to be

able to provide an estimate for the unknowns the, the vector field.

As done previously The way to address this is to assume that the pixels

in the neighborhood of the pixel under

consideration are undergoing exactly the same motion.

So I can therefore derive this equation here, optical

flow equation for all the pixels in this neighborhood.

In this case, however, the neighborhood I'm considering contains only past pixels.

Pixels for which I have estimated already their motion,

based on a particular order to, to scanned image.

So, if I assume that I used a raster scan of the image, I go one row then I return.

I cover the second row, and so on.

Then the neighborhood I use, heres an

example of this neighborhood is shown here.

This is referred to as a non-symmetric half plane.

Clearly again, assuming a raster scan of the image, for these pixels here

shown as solid dots, I have already estimated their motion vectors.

So we can assume that that's how the motion

vectors that I have estimated for these pixels look like.

So using these already estimated motion vectors, I can provide an accurate initial

estimate for the motion vector for the pixel under consideration at location r.

For example, I can use the average of the

motion vectors in this neighborhood to find d of I.

Or I can use a weighted average, or

a model based, an auto-regressive model that models

these vectors and provides a good initial estimate,

of the motion vector for the location r.

17:15

So there are two important differences in deriving this form of

the optical flow equation as compared to the one we derived a few slides earlier.

The first one is that the neighborhood that a modeling to be undergoing

the same motion now, allows a recursive,

here's the shape, it's such a shape that it allows the recursive

computability, of the motion vectors.

And, an example is shown here through this, we call it nonsymmetric half-plane.

And this allows me to obtain a good initial estimate d

of i of a vector field, which then

18:19

[BLANK_AUDIO]

That u equals d minus d of i.

Such a Pel-recursive algorithm is useful in

cases such as video compression when the

motion field, needs to be encoded and

transmitted from the encoder to the decoder.

Clearly, if a Pel-recursive algorithm is, is used, the decoder can

perform the same operations in estimating the motions as the encoder does.

And therefore, the motion vectors do not need to

be encoded and transmitted from the encoder to the decoder.

[BLANK_AUDIO]

We

18:59

show here on the left, the frame difference.

[BLANK_AUDIO]

For this sequence that's in effect with the treble white sequence.

And to the right we show the displaced frame

difference obtained, with a pel-recursive algorithm we just described.

Again, since the frame difference, or the displaced

frame difference have doubled the dynamic range of

the original image, 0 is repres, represents minus

255, gray represents 0 and wide represents cluster 55.

So we see that the displays frame

difference most values are gray therefore equal

to zero, except for the boundaries of

19:46

Which represent the difficult part in any motion estimation.

That's were the model of assuming that all the

pixels in the neighborhood is moving the same is violated.

This however a very good result because again this DFD is very close to zero.

20:08

We show here two different versions of the dense motion field.

That was obtained using the pel-recursive algorithm, and this motion field was used

to generate the display strain difference that was shown in the previous slide.

So this is dense, which means there's one vector per pixel.

And the general observation here is that the, the motion vectors on the right,

by and large are smoother than the motion vectors shown on this image on the left.

20:41

We also see that these motion vectors capture well.

They see, again this is the zoomed version so here

is the, the head, part of the head of [UNKNOWN].

Here are the shoulders.

And we see that the motion vectors clearly, clearly delineate the,

the motion boundaries, the background is stationary, so these vectors should

all be, out here should be ideally exact equal to 0.

While the torso and the head of the person is moving and therefore we have

non-zero motion vectors.

The strategy with feature based methods is to concentrate the computation

on areas of the image, where it's possible to get good correspondence.

And then from this, an initial estimate of a camera geometry is obtained.

This geometry is then used to guide correspondence in

regions of the image where there is less information.

So here's an example of two images.

The motion between the views is a rotation about the camera center.

So features are extracted, shown here in yellow.

And they're used to compute the image mat-, matching relations.

These relations are due to the camera motion

alone, not due to motion in the scene.

So about 500 feature points are detected in each of these two images.

22:05

There is considerable work in the literature

on extracting good feature points in an image.

We want feature points that are of low dimensionality,

informative, robust transformations, such as translation, rotation and degradations.

Harris corners,

22:38

Features are also used for the image query-based retrieval.

Where an image is presented to a database, and the problem becomes either the

determination of whether the image is in

the database or the retrieval of similar images.