0:01

You've seen how convolutions over 2D images works.

Â Now, let's see how you can implement convolutions over not just 2D images,

Â but over three dimensional volumes.

Â Let's start with an example.

Â Let's say you want to detect features not just in a grayscale image,

Â but in a RGB image.

Â So, RGB image might be inside a six by six image,

Â it could be six by six by three,

Â where the three here corresponds to the three color channels.

Â So, you can think of this as a stack of three six by six images.

Â In order to detect edges or some other feature in this image,

Â you convolve not to have a three by three filter as you had previously,

Â but now with also a 3D filter,

Â That's going to be three by three by three.

Â So, the filter itself will also have three layers corresponding to the red,

Â green, and blue channels.

Â So to give these things some names,

Â this first six here,

Â that's the height of the image,

Â that's the width, and this three is the number of channels.

Â And your filter also similarly has a height,

Â a width, and the number of channels.

Â And the number of channels in

Â your image must match the number of channels in your filter.

Â So, these two numbers have to be equal.

Â We'll see on the next slide how this convolution operation actually works,

Â but the output of this will be a four by four image,

Â and notice this is four by four by one,

Â there's no longer a three at the end.

Â Let's go through in detail how this works,

Â but let's use a more nicely drawn image.

Â So here is the six by six by three image,

Â and here's the three by three by three filter,

Â and this last number,

Â the number of channels matches between the image and the filter.

Â So, to simplify the drawing of this three by three by three filter,

Â instead of drawing it as a stack of three matrices,

Â I'm also going to sometimes just draw it as this three dimensional cube like that.

Â So, to compute the output of this convolution operation,

Â what you would do is take the three by three by three filter,

Â and first place it in that upper left most position.

Â So, notice that this three by three by three filter has 27 numbers,

Â well, 27 parameters as three cubed.

Â And so what you do is take each of

Â these 27 numbers and multiply them with the corresponding numbers from the red,

Â green, and blue channels of your image.

Â So, take the first nine numbers for red channel,

Â then the three beneath it for the green,

Â and the three beneath it for the blue channel,

Â and multiply it with the corresponding 27 numbers that

Â are I guess covered by this yellow cube shown on the left.

Â Then add up all those numbers,

Â and this gives you this first number in the output.

Â And if you compute the next output,

Â you take this cube and slide it over by one,

Â and again do the 27 multiplications,

Â add up the 27 numbers,

Â that gives you this next output,

Â do it for the next number over,

Â for the next position over,

Â that gives us third output, and so on.

Â That gives you the forth,

Â and then one row down,

Â and then the next one, the next one, the next one, and so on.

Â All right and you get the idea,

Â until at the very end,

Â that's the position you'll have for that final output.

Â So, what does this allow you to do?

Â Well, here's the example.

Â This filter is three by three by three.

Â So, if you want to detect edges in the red channel of the image,

Â then you could have the first filter be one, one, one, minus one minus one minus one,

Â as usual, and have the green channel be all zeros,

Â and have the blue filter be all zeros.

Â And maybe have, if

Â you have these three stack together to form your three by three by three filter,

Â then this would be a filter that detects edges,

Â vertical edges, but only in red channel.

Â Alternatively, if you don't care what color the vertical edge is in,

Â then you might have a filter just like this,

Â where is this one,

Â one, one, minus one,

Â minus one, minus one,

Â in all three channels.

Â So, by setting the second alternative, so the parameters,

Â you then have the edge detector,

Â a three by three by three edge detector,

Â to detects edges in any color.

Â And with different choices of these parameters,

Â you can get different feature detectors, all of these three by three by three filter.

Â And by convention in compute division,

Â when you have an input with a certain height,

Â a certain width, and a certain number of channels,

Â then your filter will have a potential different height,

Â different width, but the same number of channels.

Â And in theory, is possible to have a filter that maybe only looks at the red channel,

Â or maybe a filter that looks at only the green channel and the blue channel.

Â And once again, you notice that convolving a volume,

Â a six by six by three convolve with a three by three by three,

Â that gives a four by four to the output.

Â Now that you know how to convolve on volumes,

Â there's one last idea that'll be crucial for building Convolutional Neural Networks,

Â which is, what if we don't just want to detect vertical edges,

Â what if we want to detect vertical edges and horizontal edges,

Â and maybe 45 degree edges,

Â and maybe 70 degree edges as well.

Â But in other words, what if you want to use multiple filters at the same time.

Â So, here's the picture we had from the previous slide,

Â we had six by six by three convolved with three by three by three,

Â gives four by four, and maybe this is a vertical edge detector,

Â or maybe is learning to detect some other feature.

Â Now, maybe the second filter may be denoted by this orangish color,

Â which could be a horizontal edge detector.

Â So, maybe convolving it with the first filter gives you this first four by four output,

Â and convolving with the second filter gives you a different four by four output.

Â We can do this, then take these two four by four outputs, take this first one,

Â put it in front, and you can take the second filter output and well,

Â let me draw it here, put it at the back as follows,

Â so that by stacking these two together,

Â you end up with a four by four by two output volume.

Â And you can think of the volume as you redraw this is a box I guess,

Â it would maybe look like this.

Â So, this will be a four by four by two output volume,

Â which is the C result of taking your six by six by three image and

Â convolving it while applying two different three by three filters to it,

Â resulting in two four by four outputs that I think

Â it's stacked up to form a four by four by two much volume.

Â And the two here comes from the fact that we used two different filters.

Â So, let's just summarize the dimensions.

Â If you have a n by n by number of channels input image,

Â so in the example this a six by six by three,

Â where n subscript capital C is the number of channels,

Â and you convolve that with a f by f by and again this should be the same n_c,

Â so this was three by three by three.

Â And by convention, this and this, have to be the same number.

Â Then what you get is a n-f+1

Â by n-f+1 by and then

Â when you use this n_c_prime,

Â is really n_c at the next player.

Â But this is the number of filters that you use.

Â So, this in our example will be four by four by two.

Â And I wrote this assuming that you use a stride of one and no padding,

Â but if you use a different stride of padding then this n-f+1 will be affected

Â in the usual way as we saw in the previous videos.

Â So, this idea of convolution on volumes turns out to be really powerful.

Â Only a small part of it is that you can now

Â operate directly on RGB in which is with three channels.

Â But even more important is that you

Â can now detect two features like vertical and horizontal edges,

Â or 10, or maybe 128,

Â or maybe several hundred different features,

Â and the output will then have a number

Â of channels equal to the number of features you are detecting.

Â And as a note a notation,

Â I've been using your number of channels to denote this last dimension.

Â In the literature, people will also often call this the death of this 3D volume.

Â And both notations, channels or death are commonly used in the literature,

Â but I find death more confusing

Â because usually you talk about the death of the neural network as well.

Â So, I'm going to use the term channels in these videos to refer

Â to the size of this third dimension of these filters.

Â So, now that you know how to implement convolutions over volumes,

Â you now are ready to implement one layer of a convolutional neural network.

Â Let's see how they do that in the next video.

Â