0:00

>> Welcome back. Vectorization is basically

Â the art of getting rid of explicit folders in your code.

Â In the deep learning era safety in deep learning in practice,

Â you often find yourself training on relatively large data sets,

Â because that's when deep learning algorithms tend to shine.

Â And so, it's important that your code very quickly because otherwise,

Â if it's running on a big data set,

Â your code might take a long time to run then you just find

Â yourself waiting a very long time to get the result.

Â So in the deep learning era,

Â I think the ability to perform vectorization has become a key skill.

Â Let's start with an example.

Â So, what is Vectorization?

Â In logistic regression you need to compute Z equals W transpose X plus B,

Â where W was this column vector and X is also this vector.

Â Maybe there are very large vectors if you have a lot of features.

Â So, W and X were both these R and no R, NX dimensional vectors.

Â So, to compute W transpose X,

Â if you had a non-vectorized implementation,

Â you would do something like Z equals zero.

Â And then for I in range of X.

Â So, for I equals 1, 2 NX,

Â Z plus equals W I times XI.

Â And then maybe you do Z plus equal B at the end.

Â So, that's a non-vectorized implementation.

Â Then you find that that's going to be really slow.

Â In contrast, a vectorized implementation would just compute W transpose X directly.

Â In Python or a numpy,

Â the command you use for that is Z equals np.W,

Â X, so this computes W transpose X.

Â And you can also just add B to that directly.

Â And you find that this is much faster.

Â Let's actually illustrate this with a little demo.

Â So, here's my Jupiter notebook in which I'm going to write some Python code.

Â So, first, let me import the numpy library to import.

Â Send P. And so, for example,

Â I can create A as an array as follows.

Â Let's say print A.

Â Now, having written this chunk of code,

Â if I hit shift enter,

Â then it executes the code.

Â So, it created the array A and it prints it out.

Â Now, let's do the Vectorization demo.

Â I'm going to import the time libraries,

Â since we use that,

Â in order to time how long different operations take.

Â Can they create an array A?

Â Those random thought round.

Â This creates a million dimensional array with random values.

Â b = np.random.rand.

Â Another million dimensional array.

Â And, now, tic=time.time, so this measure the current time,

Â c = np.dot (a, b).

Â toc = time.time.

Â And this print,

Â it is the vectorized version.

Â It's a vectorize version.

Â And so, let's print out.

Â Let's see the last time,

Â so there's toc - tic x 1000,

Â so that we can express this in milliseconds.

Â So, ms is milliseconds.

Â I'm going to hit Shift Enter.

Â So, that code took about three milliseconds or this time 1.5,

Â maybe about 1.5 or 3.5 milliseconds at a time.

Â It varies a little bit as I run it,

Â but looks like maybe on average it's taking like 1.5 milliseconds,

Â maybe two milliseconds as I run this.

Â All right.

Â Let's keep adding to this block of code.

Â That's not implementing non-vectorize version.

Â Let's see, c = 0,

Â then tic = time.time.

Â Now, let's implement a folder.

Â For I in range of 1 million,

Â I'll pick out the number of zeros right.

Â C += (a,i) x (b,

Â i), and then toc = time.time.

Â Finally, print more than explicit full loop.

Â The time it takes is this 1000 x toc - tic + "ms"

Â to know that we're doing this in milliseconds.

Â Let's do one more thing.

Â Let's just print out the value of C we

Â compute it to make sure that it's the same value in both cases.

Â I'm going to hit shift enter to run this and check that out.

Â In both cases, the vectorize version

Â and the non-vectorize version computed the same values,

Â as you know, 2.50 to 6.99, so on.

Â The vectorize version took 1.5 milliseconds.

Â The explicit for loop and non-vectorize version took about 400, almost 500 milliseconds.

Â The non-vectorize version took something like 300

Â times longer than the vectorize version.

Â With this example you see that if only you remember to vectorize your code,

Â your code actually runs over 300 times faster.

Â Let's just run it again.

Â Just run it again.

Â Yeah. Vectorize version 1.5 milliseconds seconds and the four loop.

Â So 481 milliseconds, again,

Â about 300 times slower to do the explicit four loop.

Â If the engine x slows down,

Â it's the difference between your code taking maybe one minute to

Â run versus taking say five hours to run.

Â And when you are implementing deep learning algorithms,

Â you can really get a result back faster.

Â It will be much faster if you vectorize your code.

Â Some of you might have heard that a lot of

Â scaleable deep learning implementations are done on a GPU or a graphics processing unit.

Â But all the demos I did just now in the Jupiter notebook where actually on the CPU.

Â And it turns out that both GPU and CPU have parallelization instructions.

Â They're sometimes called SIMD instructions.

Â This stands for a single instruction multiple data.

Â But what this basically means is that,

Â if you use built-in functions such as this

Â np.function or other functions that don't require you explicitly implementing a for loop.

Â It enables Phyton Pi to take

Â much better advantage of parallelism to do your computations much faster.

Â And this is true both computations on CPUs and computations on GPUs.

Â It's just that GPUs are remarkably good at

Â these SIMD calculations but CPU is actually also not too bad at that.

Â Maybe just not as good as GPUs.

Â You're seeing how vectorization can significantly speed up your code.

Â The rule of thumb to remember is whenever possible,

Â avoid using explicit four loops.

Â Let's go onto the next video to see some more examples of

Â vectorization and also start to vectorize logistic regression.

Â