0:00

The convolution operation is one of

Â the fundamental building blocks of a convolutional neural network.

Â Using edge detection as the motivating example in this video,

Â you will see how the convolution operation works.

Â In previous videos, I have talked about

Â how the early layers of the neural network might detect edges and then

Â the some later layers might detect cause of objects and then

Â even later layers may detect cause of complete objects like people's faces in this case.

Â In this video, you see how you can detect edges in an image.

Â Lets take an example.

Â Given a picture like that for a computer

Â to figure out what are the objects in this picture,

Â the first thing you might do is maybe detect vertical edges in this image.

Â For example, this image has all those vertical lines,

Â where the buildings are,

Â as well as kind of vertical lines idea all lines of these pedestrians and

Â so those get detected in this vertical edge detector output.

Â And you might also want to detect horizontal edges so for example,

Â there is a very strong horizontal line where

Â this railing is and that also gets detected sort of roughly here.

Â How do you detect edges in image like this?

Â Let us look with an example.

Â Here is a 6 by 6 grayscale image and because this is a grayscale image,

Â this is just a 6 by 6 by 1 matrix rather

Â than 6 by 6 by 3 because they are on a separate rgb channels.

Â In order to detect edges or lets say vertical edges in his image,

Â what you can do is construct a 3 by 3 matrix

Â and in the pollens when the terminology of convolutional neural networks,

Â this is going to be called a filter.

Â And I am going to construct a 3 by 3 filter or 3 by 3 matrix that looks like this 1,

Â 1, 1, 0, 0, 0, -1, -1, -1.

Â Sometimes research papers will call this a kernel instead of

Â a filter but I am going to use the filter terminology in these videos.

Â And what you are going to do is take the 6 by 6 image and convolve it and

Â the convolution operation is denoted by this asterisk and

Â convolve it with the 3 by 3 filter.

Â One slightly unfortunate thing about the notation is that in mathematics,

Â the asterisk is the standard symbol for convolution but in Python,

Â this is also used to denote multiplication or maybe element wise multiplication.

Â This asterisk has dual purposes is overloaded notation

Â but I will try to be clear in these videos when this asterisk refers to convolution.

Â The output of this convolution operator will be a 4 by 4 matrix,

Â which you can interpret, which you can think of as a 4 by 4 image.

Â The way you compute this 4 by 4 output is as follows,

Â to compute the first elements,

Â the upper left element of this 4 by 4 matrix,

Â what you are going to do is take the 3 by 3 filter and paste it on

Â top of the 3 by 3 region of your original input image.

Â I have written here 1, 1, 1,

Â 0, 0, 0, -1, -1, -1.

Â And what you should do is take the element wise product so the first one would be

Â three times 1 and then the second one would be one times one I'm going down here,

Â one times one and then plus two times one,

Â just one and then add up all of the resulting nine numbers.

Â So then the middle column gives you zero times zero,

Â plus five times zero,

Â plus seven times zero and then the right most column gives one times -1,

Â eight times -1, plus two times -1.

Â Adding up these nine numbers will give you negative

Â 5 and so I'm going to fill in negative 5 over here.

Â You can add up these nine numbers in any order of course.

Â It is just that I went down the first column,

Â then second column, then the third.

Â Next, to figure out what is this second element,

Â you are going to take the blue square and shift it one step to the right like so.

Â Let me get rid of the green marks here.

Â You are going to do the same element wise product and then addition.

Â You have zero times one,

Â plus five times one,

Â plus seven times one,

Â plus one time zero, plus eight times zero,

Â plus two times zero,

Â plus two times negative 1, plus nine times negative one,

Â plus five times negative one and if you add up those nine numbers,

Â you end up with negative four and so on.

Â If you shift this to the right, do the nine products and add them up,

Â you get zero and then over here you should get 8.

Â Just to verify, you have 2 plus 9 plus 5 that's 16.

Â Then the middle column gives you zero and

Â then the right most column 4 plus 1 plus three times negative 1,

Â that's -8 so that is 16 on the left column -8

Â and that gives you 8 like we have over here.

Â Next, in order to get you this element in the next row

Â what you do is take the blue square and now shift it

Â one down so you now have it in that position,

Â and again repeat the element wise products and then adding exercise.

Â If you do that,

Â you should get negative 10 here.

Â If you shift it one to the right,

Â you should get negative 2 and then 2 and then 3 and so on.

Â Then fill in all the rest of the elements of the matrix.

Â To be clearer, this -16 would be obtained by from this lower right 3 by 3 region.

Â A 6 by 6 matrix convolve of the 3 by 3 matrix gives you a 4 by 4 matrix.

Â And these are images and filters.

Â These are really just matrices of various dimensions.

Â But the matrix on the left is convenient to interpret as image,

Â and the one in the middle we interpret as a filter and the one on the right,

Â you can interpret that as maybe another image.

Â And this turns out to be a vertical edge detector,

Â and you see why on the next slide.

Â Before going on though,

Â just one other comment,

Â which is that if you implement this in a programming language,

Â then in practice, most foreign languages will have

Â some different functions rather than an asterisk to denote convolution.

Â For example, in the previous exercise,

Â you use or you implement a function called conv-forward.

Â If you do this in tens of flow,

Â there is a function tf.nn.cont2d.

Â And then other deep learning programming frameworks in the CARIS program firmware,

Â we shall see later in this course,

Â there is a function called cont2d that implements convolution and so on.

Â But all the deep learning frameworks that have a good support

Â for computer vision will have some functions for implementing this convolution operator.

Â Why is this doing vertical edge detection?

Â Lets look at another example.

Â To illustrate this, we are going to use a simplified image.

Â Here is a simple 6 by

Â 6 image where the left half of the image is 10 and the right half is zero.

Â If you plot this as a picture,

Â it might look like this,

Â where the left half, the 10s,

Â give you brighter pixel

Â intensive values and the right half gives you darker pixel intensive values.

Â I am using that shade of gray to denote zeros,

Â although maybe it could also be drawn as black.

Â But in this image,

Â there is clearly a very strong vertical edge right down the middle of

Â this image as it transitions from white to black or white to darker color.

Â When you convolve this with the 3 by

Â 3 filter and so this 3 by 3 filter can be visualized as follows,

Â where is lighter, brighter pixels on

Â the left and then this mid tone zeroes in the middle and then darker on the right.

Â What you get is this matrix on the right.

Â Just to verify this math if you want,

Â this zero for example,

Â is obtained by taking

Â the element wise products and then multiplying with this 3 by 3 block and

Â so you get from

Â the left column 10 plus 10 plus 10 and then zeroes in the middle and then -10,

Â -10, -10 which is why you end up with zero over here.

Â Whereas in contrast, if that 30 will be obtained from this,

Â which you get from having 10 plus 10 plus 10 and then minus zero,

Â minus zero which is why you end up with a 30 over there.

Â Now, if you plot this right most matrix's image it will look

Â like that where there is this lighter region right in

Â the middle and that corresponds to this having

Â detected this vertical edge down the middle of your 6 by 6 image.

Â In case the dimensions here seem a

Â little bit wrong that the detected edge seems really thick,

Â that's only because we are working with very small images in this example.

Â And if you are using, say a 1000 by 1000 image rather than a 6 by 6 image then

Â you find that this does a pretty good job,

Â really detecting the vertical edges in your image.

Â In this example, this bright region in the middle is

Â just the output images way of saying that it looks like there is

Â a strong vertical edge right down the middle of the image.

Â Maybe one intuition to take away from vertical edge detection is that a vertical edge is

Â a three by three region since we are using a 3 by 3 filter

Â where there are bright pixels on the left,

Â you do not care that much what is in the middle and dark pixels on the right.

Â The middle in this 6 by 6 image is really where there could be

Â bright pixels on the left and darker pixels on the right and

Â that is why it thinks its a vertical edge over there.

Â The convolution operation gives you a convenient way to

Â specify how to find these vertical edges in an image.

Â You have now seen how the convolution operator works.

Â In the next video, you will see how to take this and use it

Â as one of the basic building blocks of a Convolution Neural Network.

Â