0:00

In terms of designing content architectures,

one of the ideas that really helps is using a one by one convolution.

Now, you might be wondering,

what does a one by one convolution do?

Isn't that just multiplying by numbers?

That seems like a funny thing to do.

Turns out it's not quite like that.

Let's take a look.

So you'll see one by one filter,

we'll put in number two there,

and if you take the six by six image,

six by six by one and convolve it with this one by one by one filter,

you end up just taking the image and multiplying it by two.

So, one, two, three ends up being two,

four, six, and so on.

And so, a convolution by a one by one filter,

doesn't seem particularly useful.

You just multiply it by some number.

But that's the case of six by six by one channel images.

If you have a 6 by 6 by 32 instead of by 1,

then a convolution with a 1 by 1 filter can do something that makes much more sense.

And in particular, what a one by one convolution will do is it will

look at each of the 36 different positions here,

and it will take the element wise product between

32 numbers on the left and 32 numbers in the filter.

And then apply a ReLU non-linearity to it after that.

So, to look at one of the 36 positions,

maybe one slice through this value,

you take these 36 numbers multiply it by 1 slice through the volume like that,

and you end up with

a single real number which then gets plotted in one of the outputs like that.

And in fact, one way to think about

the 32 numbers you have in this 1 by 1 by 32 filters is that,

it's as if you have neuron that is taking as input,

32 numbers multiplying each of these 32 numbers in

one slice of the same position heightened with by these 32 different channels,

multiplying them by 32 weights and then applying

a ReLU non-linearity to it and then outputting the corresponding thing over there.

And more generally, if you have not just one filter,

but if you have multiple filters,

then it's as if you have not just one unit, but multiple units,

taken as input all the numbers in one slice,

and then building them up into an output of six by six by number of filters.

So one way to think about the one by one convolution is that,

it is basically having a fully connected neuron network,

that applies to each of the 62 different positions.

And what does fully connected neural network does?

Is it puts 32 numbers and outputs number of filters outputs.

So I guess the point on notation,

this is really a nc(l+1),

if that's the next layer.

And by doing this at each of the 36 positions,

each of the six by six positions,

you end up with an output that is six by six by the number of filters.

And this can carry out a pretty non-trivial computation on your input volume.

And this idea is often called a one by one convolution

but it's sometimes also called Network in Network,

and is described in this paper,

by Min Lin, Qiang Chen, and Schuicheng Yan.

And even though the details of the architecture in this paper aren't used widely,

this idea of a one by one convolution or this

sometimes called Network in Network idea has been very influential,

has influenced many other neural network architectures

including the inception network which we'll see in the next video.

But to give you an example of where one by one convolution is useful,

here's something you could do with it.

Let's say you have a 28 by 28 by 192 volume.

If you want to shrink the height and width,

you can use a pulling layer.

So we know how to do that.

But one of a number of channels has gotten too big and we want to shrink that.

How do you shrink it to a 28 by 28 by 32 dimensional volume?

Well, what you can do is use 32 filters that are one by one.

And technically, each filter would be of dimension 1 by 1 by 192,

because the number of channels in

your filter has to match the number of channels in your input volume,

but you use 32 filters and the output of this process will be a 28 by 28 by 32 volume.

So this is a way to let you shrink nc as well,

whereas pulling layers, I used just to shrink nH and nW,

the height and width these volumes.

And we'll see later how this idea of one by one

convolutions allows you to shrink the number of channels and therefore,

save on computation in some networks.

But of course, if you want to keep the number of channels at 192, that's fine too.

And the effect of the one by one convolution is it just adds non-linearity.

It allows you to learn the more complex function of your network by adding

another layer that inputs 28 by 28 by 192 and outputs 28 by 28 by 192.

So, that's how a one by

one convolutional layer is actually doing something pretty non-trivial

and adds non-linearity to your neural network and allow

you to decrease or keep the same or if you want,

increase the number of channels in your volumes.

Next, you'll see that this is actually very useful for building the inception network.

Let's go on to that in the next video.

So, you've now seen how a one by one convolution operation is actually doing

a pretty non-trivial operation and it allows you to shrink

the number of channels in your volumes or

keep it the same or even increase it if you want.

In the next video,

you see that this can be used to help build

up to the inception network. Let's go into the-