0:00

For this final video for this week,

let's talk a bit about why convolutions are so

useful when you include them in your neural networks.

And then finally, let's briefly talk about how to put this all together and how

you could train a convolution neural network when you have a label training set.

I think there are two main advantages of

convolutional layers over just using fully connected layers.

And the advantages are parameter sharing and sparsity of connections.

Let me illustrate with an example.

Let's say you have a 32 by 32 by 3 dimensional image,

and this actually comes from the example from the previous video,

but let's say you use five by five filter with six filters.

And so, this gives you a 28 by 28 by 6 dimensional output.

So, 32 by 32 by 3 is 3,072,

and 28 by 28 by 6 if you multiply all those numbers is 4,704.

And so, if you were to create a neural network with 3,072 units in one layer,

and with 4,704 units in the next layer,

and if you were to connect every one of these neurons,

then the weight matrix,

the number of parameters in a weight matrix would be 3,072

times 4,704 which is about 14 million.

So, that's just a lot of parameters to train.

And today you can train neural networks with even more parameters than 14 million,

but considering that this is just a pretty small image,

this is a lot of parameters to train.

And of course, if this were to be 1,000 by 1,000 image,

then your display matrix will just become invisibly large.

But if you look at the number of parameters in this convolutional layer,

each filter is five by five.

So, each filter has 25 parameters,

plus a bias parameter miss of 26 parameters per a filter,

and you have six filters, so,

the total number of parameters is that,

which is equal to 156 parameters.

And so, the number of parameters in this conv layer remains quite small.

And the reason that a consonant has run to these small parameters is really two reasons.

One is parameter sharing.

And parameter sharing is motivated by the observation

that feature detector such as vertical edge detector,

that's useful in one part of the image is probably useful in another part of the image.

And what that means is that,

if you've figured out say a three by three filter for detecting vertical edges,

you can then apply the same three by three filter over here,

and then the next position over,

and the next position over, and so on.

And so, each of these feature detectors,

each of these aqua's can use the same parameters in lots of

different positions in your input image in order to

detect say a vertical edge or some other feature.

And I think this is true for low-level features like edges,

as well as the higher level features, like maybe,

detecting the eye that indicates a face or a cat or something there.

But being with a share in this case

the same nine parameters to compute all 16 of these aquas,

is one of the ways the number of parameters is reduced.

And it also just seems intuitive that a feature detector

like a vertical edge detector computes it for the upper left-hand corner of the image.

The same feature seems like it will probably be useful,

has a good chance of being useful for the lower right-hand corner of the image.

So, maybe you don't need to learn

separate feature detectors for

the upper left and the lower right-hand corners of the image.

And maybe you do have a dataset where you have

the upper left-hand corner and lower right-hand corner have different distributions, so,

they maybe look a little bit different but they might be similar enough,

they're sharing feature detectors all across the image, works just fine.

The second way that consonants get away with

having relatively few parameters is by having sparse connections.

So, here's what I mean,

if you look at the zero,

this is computed via three by three convolution.

And so, it depends only on this three by three inputs grid or cells.

So, it is as if this output units on the right is connected only

to nine out of these six by six, 36 input features.

And in particular, the rest of these pixel values,

all of these pixel values do not have any effects on the other output.

So, that's what I mean by sparsity of connections.

As another example, this output depends only on these nine input features.

And so, it's as if only those nine input features are connected to this output,

and the other pixels just don't affect this output at all.

And so, through these two mechanisms,

a neural network has a lot fewer parameters which allows it

to be trained with smaller training cells and is less prone to be over 30.

And so, sometimes you also hear about

convolutional neural networks being very good at capturing translation invariance.

And that's the observation that

a picture of a cat shifted a couple of pixels to the right,

is still pretty clearly a cat.

And convolutional structure helps the neural network encode the fact that an image

shifted a few pixels should result in pretty similar features and

should probably be assigned the same oval label.

And the fact that you are applying to same filter,

knows all the positions of the image,

both in the early layers and in the late layers that

helps a neural network automatically learn to be more

robust or to better capture the desirable property of translation invariance.

So, these are maybe a couple of the reasons why

convolutions or convolutional neural network work so well in computer vision.

Finally, let's put it all together and see how you can train one of these networks.

Let's say you want to build a cat detector and you

have a labeled training sets as follows,

where now, X is an image.

And the y's can be binary labels,

or one of K causes.

And let's say you've chosen a convolutional neural network structure,

may be inserted the image and then having neural convolutional and pulling layers

and then some fully connected layers

followed by a software output that then operates Y hat.

The conv layers and the fully connected layers will have various parameters,

W, as well as bias's B.

And so, any setting of the parameters, therefore,

lets you define a cost function similar to what we have seen in the previous courses,

where we've randomly initialized parameters W and B.

You can compute the cause J,

as the sum of losses of the neural networks predictions on your entire training set,

maybe divide it by M. So,

to train this neural network,

all you need to do is then use gradient descents or some of

the algorithm like, gradient descent momentum,

or RMSProp or Adam, or something else,

in order to optimize all the parameters of

the neural network to try to reduce the cost function J.

And you find that if you do this,

you can build a very effective cat detector or some other detector.

So, congratulations on finishing this week's videos.

You've now seen all the basic building blocks of a convolutional neural network,

and how to put them together into an effective image recognition system.

In this week's program exercises,

I think all of these things will come more concrete,

and you'll get the chance to practice implementing

these things yourself and seeing it work for yourself.

Next week, we'll continue to go deeper into convolutional neural networks.

I mentioned earlier, that there're just a lot of

the hyperparameters in convolution neural networks.

So, what I want to do next week,

is show you a few concrete examples of some of

the most effective convolutional neural networks,

so you can start to recognize the patterns

of what types of network architectures are effective.

And one thing that people often do is just take the architecture that

someone else has found and published in

a research paper and just use that for your application.

And so, by seeing some more concrete examples next week,

you also learn how to do that better.

And beyond that, next week,

we'll also just get that intuitions about what makes confinet work well,

and then in the rest of the course,

we'll also see a variety of other computer vision applications such as,

object detection, and neural store transfer.

How they create new forms of artwork using these set of algorithms.

So, that's over this week,

best of luck with the home works,

and I look forward to seeing you next week.