0:00

Strided convolutions is another piece of

Â the basic building block of convolutions as used in Convolutional Neural Networks.

Â Let me show you an example.

Â Let's say you want to convolve this seven by seven image with this three by three filter,

Â except that instead of doing the usual way,

Â we are going to do it with a stride of two.

Â What that means is you take the element Y's product as usual in this upper

Â left three by three region and then multiply and add and that gives you 91.

Â But then instead of stepping the blue box over by one step,

Â we are going to step over by two steps.

Â So, we are going to make it hop over two steps like so.

Â Notice how the upper left hand corner has gone from this start to this start,

Â jumping over one position.

Â And then you do the usual element Y's product and summing it turns out 100.

Â And now we are going to do they do that again,

Â and make the blue box jump over by two steps.

Â You end up there, and that gives you 83.

Â Now, when you go to the next row,

Â you again actually take two steps instead of

Â one step so going to move the blue box over there.

Â Notice how we are stepping over one of the positions and then this gives you 69,

Â and now you again step over two steps,

Â this gives you 91 and so on so 127.

Â And then for the final row 44, 72, and 74.

Â In this example, we convolve with a seven by seven matrix

Â to this three by three matrix and we get a three by three outputs.

Â The input and output dimensions turns out to be governed by the following formula,

Â if you have an N by N image,

Â they convolve with an F by F filter.

Â And if you use padding P and stride S. In this example,

Â S is equal to two then you end up with an output that is N plus two P minus F,

Â and now because you're stepping S steps of the time,

Â you step just one step of the time,

Â you now divide by S plus one and then can apply the same thing.

Â In our example, we have seven plus zero, minus three,

Â divided by two S stride plus one equals let's see,

Â that's four over two plus one equals three,

Â which is why we wound up with this is three by three output.

Â Now, just one last detail which is what of this fraction is not an integer?

Â In that case, we're going to round this

Â down so this notation denotes the flow of something.

Â This is also called the flow of Z.

Â It means taking Z and rounding down to the nearest integer.

Â The way this is implemented is that you take

Â this type of blue box multiplication only if the blue box is fully contained

Â within the image or the image plus to the padding and if

Â any of this blue box kind of part of it hangs

Â outside and you just do not do that computation.

Â Then it turns out that if that's the convention that your three by three filter,

Â must lie entirely within your image or the image

Â plus the padding region before there's as

Â a corresponding output generated that's convention.

Â Then the right thing to do to compute the output dimension is

Â to round down in case this N plus two P minus F over S is not an integer.

Â Just to summarize the dimensions,

Â if you have an N by N matrix or N by N image that you convolve

Â with an F by F matrix or F by F filter with padding P N stride S,

Â then the output size will have this dimension.

Â It is nice we can choose all of these numbers so that there is an integer

Â although sometimes you don't have to do that and rounding down is just fine as well.

Â But please feel free to work through a few examples of values of N, F,

Â P and S on yourself to convince yourself if you want,

Â that this formula is correct for the output size.

Â Now, before moving on there is a technical comment I want to make about

Â cross-correlation versus convolutions and just for

Â the facts what you have to do to implement convolutional neural networks.

Â If you reading different math textbook or signal processing textbook,

Â there is one other possible inconsistency in the notation which is that,

Â if you look at the typical math textbook,

Â the way that the convolution is defined before doing the element Y's product and summing,

Â there's actually one other step that you'll first take which

Â is to convolve this six by six matrix with this three by three filter.

Â You at first take the three by three filter and slip it on

Â the horizontal as well as the vertical axis so this 345102 minus 197,

Â will become, three goes here, four goes there,

Â five goes there and then the second row

Â becomes this,102 minus 197.

Â Well, this is really taking the three by three filter and narrowing

Â it both on the vertical and horizontal axes.

Â And then it was this flit matrix that you would then copy over here.

Â To compute the output,

Â you will take two times seven,

Â plus three times two,

Â plus seven times five and so on.

Â I should multiply out the elements of this flit matrix in order to

Â compute the upper left hand rows elements of the four by four output as follows.

Â Then you take those nine numbers

Â and shift them over by one shift them over by one and so on.

Â The way we've define the convolution operation in

Â this video is that we've skipped this narrowing operation.

Â Technically, what we're actually doing,

Â the operation we've been using for the last few videos

Â is sometimes cross-correlation instead of convolution.

Â But in the deep learning literature by convention,

Â we just call this a convolutional operation.

Â Just to summarize, by convention in machine learning,

Â we usually do not bother with this skipping operation and technically,

Â this operation is maybe better called cross-correlation but most of

Â the deep learning literature just calls it the convolution operator.

Â And so I'm going to use that convention in these videos as well,

Â and if you read a lot of the machines learning literature,

Â you'll find most people just call this

Â the convolution operator without bothering to use these slips.

Â It turns out that in signal processing or in certain branches of mathematics,

Â doing the flipping in the definition of

Â convolution causes convolution operator to enjoy this property that A convolve with B,

Â convolve with C is equal to A convolve with B,

Â convolve with C, and this is called associativity in mathematics.

Â This is nice for some signal processing applications but

Â for deep neural networks it really doesn't matter and so omitting

Â this double mirroring operation just simplifies

Â the code and makes the neural networks work just as well.

Â And by convention, most of us just call this convolution

Â or even though the mathematicians prefer to call this cross-correlation sometimes.

Â But this should not affect anything you have to implement in

Â the problem exercises and should not

Â affect your ability to read and understand the deep learning literature.

Â You've now seen how to carry out convolutions and you've

Â seen how to use padding as well as strides to convolutions.

Â But so far, all we've been using is convolutions over matrices,

Â like over a six by six matrix.

Â In the next video, you'll see how to carry out convolutions over volumes

Â and this would make what you can do a convolutions sounds really much more powerful.

Â Let's go on to the next video.

Â