0:00

The convolution operation is one of

the fundamental building blocks of a convolutional neural network.

Using edge detection as the motivating example in this video,

you will see how the convolution operation works.

In previous videos, I have talked about

how the early layers of the neural network might detect edges and then

the some later layers might detect cause of objects and then

even later layers may detect cause of complete objects like people's faces in this case.

In this video, you see how you can detect edges in an image.

Lets take an example.

Given a picture like that for a computer

to figure out what are the objects in this picture,

the first thing you might do is maybe detect vertical edges in this image.

For example, this image has all those vertical lines,

where the buildings are,

as well as kind of vertical lines idea all lines of these pedestrians and

so those get detected in this vertical edge detector output.

And you might also want to detect horizontal edges so for example,

there is a very strong horizontal line where

this railing is and that also gets detected sort of roughly here.

How do you detect edges in image like this?

Let us look with an example.

Here is a 6 by 6 grayscale image and because this is a grayscale image,

this is just a 6 by 6 by 1 matrix rather

than 6 by 6 by 3 because they are on a separate rgb channels.

In order to detect edges or lets say vertical edges in his image,

what you can do is construct a 3 by 3 matrix

and in the pollens when the terminology of convolutional neural networks,

this is going to be called a filter.

And I am going to construct a 3 by 3 filter or 3 by 3 matrix that looks like this 1,

1, 1, 0, 0, 0, -1, -1, -1.

Sometimes research papers will call this a kernel instead of

a filter but I am going to use the filter terminology in these videos.

And what you are going to do is take the 6 by 6 image and convolve it and

the convolution operation is denoted by this asterisk and

convolve it with the 3 by 3 filter.

One slightly unfortunate thing about the notation is that in mathematics,

the asterisk is the standard symbol for convolution but in Python,

this is also used to denote multiplication or maybe element wise multiplication.

This asterisk has dual purposes is overloaded notation

but I will try to be clear in these videos when this asterisk refers to convolution.

The output of this convolution operator will be a 4 by 4 matrix,

which you can interpret, which you can think of as a 4 by 4 image.

The way you compute this 4 by 4 output is as follows,

to compute the first elements,

the upper left element of this 4 by 4 matrix,

what you are going to do is take the 3 by 3 filter and paste it on

top of the 3 by 3 region of your original input image.

I have written here 1, 1, 1,

0, 0, 0, -1, -1, -1.

And what you should do is take the element wise product so the first one would be

three times 1 and then the second one would be one times one I'm going down here,

one times one and then plus two times one,

just one and then add up all of the resulting nine numbers.

So then the middle column gives you zero times zero,

plus five times zero,

plus seven times zero and then the right most column gives one times -1,

eight times -1, plus two times -1.

Adding up these nine numbers will give you negative

5 and so I'm going to fill in negative 5 over here.

You can add up these nine numbers in any order of course.

It is just that I went down the first column,

then second column, then the third.

Next, to figure out what is this second element,

you are going to take the blue square and shift it one step to the right like so.

Let me get rid of the green marks here.

You are going to do the same element wise product and then addition.

You have zero times one,

plus five times one,

plus seven times one,

plus one time zero, plus eight times zero,

plus two times zero,

plus two times negative 1, plus nine times negative one,

plus five times negative one and if you add up those nine numbers,

you end up with negative four and so on.

If you shift this to the right, do the nine products and add them up,

you get zero and then over here you should get 8.

Just to verify, you have 2 plus 9 plus 5 that's 16.

Then the middle column gives you zero and

then the right most column 4 plus 1 plus three times negative 1,

that's -8 so that is 16 on the left column -8

and that gives you 8 like we have over here.

Next, in order to get you this element in the next row

what you do is take the blue square and now shift it

one down so you now have it in that position,

and again repeat the element wise products and then adding exercise.

If you do that,

you should get negative 10 here.

If you shift it one to the right,

you should get negative 2 and then 2 and then 3 and so on.

Then fill in all the rest of the elements of the matrix.

To be clearer, this -16 would be obtained by from this lower right 3 by 3 region.

A 6 by 6 matrix convolve of the 3 by 3 matrix gives you a 4 by 4 matrix.

And these are images and filters.

These are really just matrices of various dimensions.

But the matrix on the left is convenient to interpret as image,

and the one in the middle we interpret as a filter and the one on the right,

you can interpret that as maybe another image.

And this turns out to be a vertical edge detector,

and you see why on the next slide.

Before going on though,

just one other comment,

which is that if you implement this in a programming language,

then in practice, most foreign languages will have

some different functions rather than an asterisk to denote convolution.

For example, in the previous exercise,

you use or you implement a function called conv-forward.

If you do this in tens of flow,

there is a function tf.nn.cont2d.

And then other deep learning programming frameworks in the CARIS program firmware,

we shall see later in this course,

there is a function called cont2d that implements convolution and so on.

But all the deep learning frameworks that have a good support

for computer vision will have some functions for implementing this convolution operator.

Why is this doing vertical edge detection?

Lets look at another example.

To illustrate this, we are going to use a simplified image.

Here is a simple 6 by

6 image where the left half of the image is 10 and the right half is zero.

If you plot this as a picture,

it might look like this,

where the left half, the 10s,

give you brighter pixel

intensive values and the right half gives you darker pixel intensive values.

I am using that shade of gray to denote zeros,

although maybe it could also be drawn as black.

But in this image,

there is clearly a very strong vertical edge right down the middle of

this image as it transitions from white to black or white to darker color.

When you convolve this with the 3 by

3 filter and so this 3 by 3 filter can be visualized as follows,

where is lighter, brighter pixels on

the left and then this mid tone zeroes in the middle and then darker on the right.

What you get is this matrix on the right.

Just to verify this math if you want,

this zero for example,

is obtained by taking

the element wise products and then multiplying with this 3 by 3 block and

so you get from

the left column 10 plus 10 plus 10 and then zeroes in the middle and then -10,

-10, -10 which is why you end up with zero over here.

Whereas in contrast, if that 30 will be obtained from this,

which you get from having 10 plus 10 plus 10 and then minus zero,

minus zero which is why you end up with a 30 over there.

Now, if you plot this right most matrix's image it will look

like that where there is this lighter region right in

the middle and that corresponds to this having

detected this vertical edge down the middle of your 6 by 6 image.

In case the dimensions here seem a

little bit wrong that the detected edge seems really thick,

that's only because we are working with very small images in this example.

And if you are using, say a 1000 by 1000 image rather than a 6 by 6 image then

you find that this does a pretty good job,

really detecting the vertical edges in your image.

In this example, this bright region in the middle is

just the output images way of saying that it looks like there is

a strong vertical edge right down the middle of the image.

Maybe one intuition to take away from vertical edge detection is that a vertical edge is

a three by three region since we are using a 3 by 3 filter

where there are bright pixels on the left,

you do not care that much what is in the middle and dark pixels on the right.

The middle in this 6 by 6 image is really where there could be

bright pixels on the left and darker pixels on the right and

that is why it thinks its a vertical edge over there.

The convolution operation gives you a convenient way to

specify how to find these vertical edges in an image.

You have now seen how the convolution operator works.

In the next video, you will see how to take this and use it

as one of the basic building blocks of a Convolution Neural Network.