In order to build deep neural networks one modification to

the basic convolutional operation that you need to really use is padding.

Let's see how it works.

What we saw in earlier videos is that if you take

a six by six image and convolve it with a three by three filter,

you end up with a four by four output with a four by four matrix,

and that's because the number of possible positions with the three by three filter,

there are only, sort of,

four by four possible positions,

for the three by three filter to fit in your six by six matrix.

And the math of this this turns out to be that if you have

a end by end image and to involved that with an f by f filter,

then the dimension of the output will be;

n minus f plus one by n minus f plus one.

And in this example,

six minus three plus one is equal to four,

which is why you wound up with a four by four output.

So the two downsides to this; one is that,

if every time you apply a convolutional operator, your image shrinks,

so you come from six by six down to four by four then,

you can only do this a few times before your image starts getting really small,

maybe it shrinks down to one by one or something,

so maybe, you don't want your image to shrink

every time you detect edges or to set other features on it,

so that's one downside,

and the second downside is that,

if you look the pixel at the corner or the edge,

this little pixel is touched as used only in one of the outputs,

because this touches that three by three region.

Whereas, if you take a pixel in the middle, say this pixel,

then there are a lot of three by three regions that overlap that pixel and so,

is as if pixels on the corners or on the edges are use much less in the output.

So you're throwing away a lot of the information near the edge of the image.

So, to solve both of these problems,

both the shrinking output,

and when you build really deep neural networks,

you see why you don't want the image to shrink on every step because if you have,

maybe a hundred layer of deep net,

then it'll shrinks a bit on every layer,

then after a hundred layers you end up with a very small image.

So that was one problem,

the other is throwing away a lot of the information from the edges of the image.

So in order to fix both of these problems,

what you can do is the full apply of convolutional operation.

You can pad the image.

So in this case, let's say you pad the image with an additional one border,

with the additional border of one pixel all around the edges.

So, if you do that,

then instead of a six by six image,

you've now padded this to eight by eight image and if you

convolve an eight by eight image with a three by three image you now get that out.

Now, the four by four by the six by six image,

so you managed to preserve the original input size of six by six.

So by convention when you pad,

you padded with zeros and if p is the padding amounts.

So in this case,

p is equal to one,

because we're padding all around with an extra boarder of one pixels,

then the output becomes

n plus 2p minus f plus one by n plus 2p minus f by one.

So, this becomes six plus two times one minus three plus one by the same thing on that.

So, six plus two minus three plus one that's equals to six.

So you end up with a six by six image that preserves the size of the original image.

So this being pixel actually influences all of

these cells of the output and so this effective,

maybe not by throwing away but counting less

the information from the edge of the corner or the edge of the image is reduced.

And I've shown here,

the effect of padding deep border with just one pixel.

If you want, you can also pad the border with two pixels, in which case I guess,

you do add on another border

here and they can pad it with even more pixels if you choose.

So, I guess what I'm drawing here,

this would be a padded equals to p plus two.

In terms of how much to pad,

it turns out there two common choices that are called,

Valid convolutions and Same convolutions.

Not really is a great names but in a valid convolution,

this basically means no padding.

And so in this case you might have n by n image convolve with an f by

f filter and this would give you an n minus

f plus one by n minus f plus one dimensional output.

So this is like the example we had previously on the previous videos where we had

an n by n image convolve with

the three by three filter and that gave you a four by four output.

The other most common choice of padding is called

the same convolution and that means when you pad,

so the output size is the same as the input size.

So if we actually look at this formula,

when you pad by p pixels then,

its as if n goes to n plus 2p and then you have from the rest of this, right?

Minus f plus one.

So we have an n by n image and the padding of a border of p pixels all around,

then the output sizes of this dimension is xn plus 2p minus f plus one.

And so, if you want n plus 2p minus f plus one to be equal to one,

so the output size is same as input size,

if you take this and solve for, I guess,

n cancels out on both sides and if you solve for p,

this implies that p is equal to f minus one over two.

So when f is odd,

by choosing the padding size to be as follows,

you can make sure that the output size is same as

the input size and that's why, for example,

when the filter was three by three as this had happened in the previous slide,

the padding that would make the output size the same as the input size was three minus

one over two, which is one.

And as another example,

if your filter was five by five,

so if f is equal to five, then,

if you pad it into that equation you find that the padding of two is required to keep

the output size the same as the input size when the filter is five by five.

And by convention in computer vision,

f is usually odd.

It's actually almost always odd and you rarely see even numbered filters,

filter works using computer vision.

And I think that two reasons for that;

one is that if f was even,

then you need some asymmetric padding.

So only if f is odd that this type of same convolution gives a natural padding region,

had the same dimension all around rather than

pad more on the left and pad less on the right,

or something that asymmetric.

And then second, when you have an odd dimension filter,

such as three by three or five by five,

then it has a central position and sometimes in

computer vision its nice to have a distinguisher,

it's nice to have a pixel,

you can call the central pixel so you can talk about the position of the filter.

Right, maybe none of this is a great reason for using f to be pretty much always

odd but if you look a convolutional literature you

see three by three filters are very common.

You see some five by five, seven by sevens.

And actually sometimes, later we'll also

talk about one by one filters and that why that makes sense.

But just by convention,

I recommend you just use odd number filters as well.

I think that you can probably get

just fine performance even if you want to use an even number value for f,

but if you stick to the common computer vision convention,

I usually just use odd number f. So you've now seen how to use padded convolutions.

To specify the padding for your convolution operation,

you can either specify the value for

p or you can just say that this is a valid convolution,

which means p equals zero or you can say this is a same convolution,

which means pad as much as you need to make sure

the output has same dimension as the input.

So that's it for padding.

In the next video, let's talk about how you can implement Strided convolutions.