0:00

if the basic technical idea is behind

Â deep learning behind your networks have

Â been around for decades why are they

Â only just now taking off in this video

Â let's go over some of the main drivers

Â behind the rise of deep learning because

Â I think this will help you that the spot

Â the best opportunities within your own

Â organization to apply these to over the

Â last few years a lot of people have

Â asked me Andrew why is deep learning

Â certainly working so well and when a

Â marsan question this is usually the

Â picture I draw for them let's say we

Â plot a figure where on the horizontal

Â axis we plot the amount of data we have

Â for a task and let's say on the vertical

Â axis we plot the performance on above

Â learning algorithms such as the accuracy

Â of our spam classifier or our ad click

Â predictor or the accuracy of our neural

Â net for figuring out the position of

Â other calls for our self-driving car it

Â turns out if you plot the performance of

Â a traditional learning algorithm like

Â support vector machine or logistic

Â regression as a function of the amount

Â of data you have you might get a curve

Â that looks like this where the

Â performance improves for a while as you

Â add more data but after a while the

Â performance you know pretty much

Â plateaus right suppose your horizontal

Â lines enjoy that very well you know was

Â it they didn't know what to do with huge

Â amounts of data and what happened in our

Â society over the last 10 years maybe is

Â that for a lot of problems we went from

Â having a relatively small amount of data

Â to having you know often a fairly large

Â amount of data and all of this was

Â thanks to the digitization of a society

Â where so much human activity is now in

Â the digital realm we spend so much time

Â on the computers on websites on mobile

Â apps and activities on digital devices

Â creates data and thanks to the rise of

Â inexpensive cameras built into our cell

Â phones accelerometers all sorts of

Â sensors in the Internet of Things we

Â also just have been collecting one more

Â and more data so over the last 20 years

Â for a lot of applications we just

Â accumulate

Â a lot more data more than traditional

Â learning algorithms were able to

Â effectively take advantage of and what

Â new network lead turns out that if you

Â train a small neural net then this

Â performance maybe looks like that

Â if you train a somewhat larger Internet

Â that's called as a medium-sized internet

Â to fall in something a little bit better

Â and if you train a very large neural net

Â then it's the form and often just keeps

Â getting better and better so couple

Â observations one is if you want to hit

Â this very high level of performance then

Â you need two things first often you need

Â to be able to train a big enough neural

Â network in order to take advantage of

Â the huge amount of data and second you

Â need to be out here on the x axes you do

Â need a lot of data so we often say that

Â scale has been driving deep learning

Â progress and by scale I mean both the

Â size of the neural network we need just

Â a new network a lot of hidden units a

Â lot of parameters a lot of connections

Â as well as scale of the data in fact

Â today one of the most reliable ways to

Â get better performance in the neural

Â network is often to either train a

Â bigger network or throw more data at it

Â and that only works up to a point

Â because eventually you run out of data

Â or eventually then your network is so

Â big that it takes too long to train but

Â just improving scale has actually taken

Â us a long way in the world of learning

Â in order to make this diagram a bit more

Â technically precise and just add a few

Â more things I wrote the amount of data

Â on the x-axis technically this is amount

Â of labeled data where by label data

Â I mean training examples we have both

Â the input X and the label Y I went to

Â introduce a little bit of notation that

Â we'll use later in this course we're

Â going to use lowercase alphabet to

Â denote the size of my training sets or

Â the number of training examples

Â this lowercase M so that's the

Â horizontal axis couple other details to

Â this Tigger

Â in this regime of smaller training sets

Â the relative ordering of the algorithms

Â is actually not very well defined so if

Â you don't have a lot of training data is

Â often up to your skill at hand

Â engineering features that determines the

Â foreman so it's quite possible that if

Â someone training an SVM is more

Â motivated to hand engineer features and

Â someone training even large their own

Â that may be in this small training set

Â regime the SEM could do better

Â so you know in this region to the left

Â of the figure the relative ordering

Â between gene algorithms is not that well

Â defined and performance depends much

Â more on your skill at engine features

Â and other mobile details of the

Â algorithms and there's only in this some

Â big data regime very large training sets

Â very large M regime in the right that we

Â more consistently see largely Ronettes

Â dominating the other approaches and so

Â if any of your friends ask you why are

Â known as you know taking off I would

Â encourage you to draw this picture for

Â them as well so I will say that in the

Â early days in their modern rise of deep

Â learning

Â it was scaled data and scale of

Â computation just our ability to Train

Â very large dinner networks

Â either on a CPU or GPU that enabled us

Â to make a lot of progress but

Â increasingly especially in the last

Â several years we've seen tremendous

Â algorithmic innovation as well so I also

Â don't want to understate that

Â interestingly many of the algorithmic

Â innovations have been about trying to

Â make neural networks run much faster so

Â as a concrete example one of the huge

Â breakthroughs in your networks has been

Â switching from a sigmoid function which

Â looks like this to a railer function

Â which we talked about briefly in an

Â early video that looks like this if you

Â don't understand the details of one

Â about the state don't worry about it but

Â it turns out that one of the problems of

Â using sigmoid functions and machine

Â learning is that there these regions

Â here where the slope of the function

Â would

Â gradient is nearly zero and so learning

Â becomes really slow because when you

Â implement gradient descent and gradient

Â is zero the parameters just change very

Â slowly and so learning is very slow

Â whereas by changing the what's called

Â the activation function the neural

Â network to use this function called the

Â value function of the rectified linear

Â unit our elu the gradient is equal to

Â one for all positive values of input

Â right and so the gradient is much less

Â likely to gradually shrink to zero and

Â the gradient here the slope of this line

Â is zero on the left but it turns out

Â that just by switching to the sigmoid

Â function to the rayleigh function has

Â made an algorithm called gradient

Â descent work much faster and so this is

Â an example of maybe relatively simple

Â algorithm in Bayesian but ultimately the

Â impact of this algorithmic innovation

Â was it really hope computation so the

Â regimen quite a lot of examples like

Â this of where we change the algorithm

Â because it allows that code to run much

Â faster and this allows us to train

Â bigger neural networks or to do so the

Â reason or multi-client even when we have

Â a large network roam all the data the

Â other reason that fast computation is

Â important is that it turns out the

Â process of training your network this is

Â very intuitive often you have an idea

Â for a neural network architecture and so

Â you implement your idea and code

Â implementing your idea then lets you run

Â an experiment which tells you how well

Â your neural network does and then by

Â looking at it you go back to change the

Â details of your new network and then you

Â go around this circle over and over and

Â when your new network takes a long time

Â to Train it just takes a long time to go

Â around this cycle and there's a huge

Â difference in your productivity building

Â effective neural networks when you can

Â have an idea and try it and see the work

Â in ten minutes or maybe ammos a day

Â versus if you've to train your neural

Â network for a month which sometimes does

Â happened

Â because you get a result back you know

Â in ten minutes or maybe in a day you

Â should just try a lot more ideas and be

Â much more likely to discover in your

Â network and it works well for your

Â application and so faster computation

Â has really helped in terms of speeding

Â up the rate at which you can get an

Â experimental result back and this has

Â really helped both practitioners of

Â neuro networks as well as researchers

Â working and deep learning iterate much

Â faster and improve your ideas much

Â faster and so all this has also been a

Â huge boon to the entire deep learning

Â research community which has been

Â incredible with just you know inventing

Â new algorithms and making nonstop

Â progress on that front so these are some

Â of the forces powering the rise of deep

Â learning but the good news is that these

Â forces are still working powerfully to

Â make deep learning even better Tech Data

Â society is still throwing up one more

Â digital data or take computation with

Â the rise of specialized hardware like

Â GPUs and faster networking many types of

Â hardware I'm actually quite confident

Â that our ability to do very large neural

Â networks or should a computation point

Â of view will keep on getting better and

Â take algorithms relative learning

Â research communities though continuously

Â phenomenal at innovating on the

Â algorithms front so because of this I

Â think that we can be optimistic answer

Â the optimistic the deep learning will

Â keep on getting better for many years to

Â come

Â so that let's go on to the last video of

Â the section where we'll talk a little

Â bit more about what you learn from this

Â course

Â