0:20

The main reason that we're introducing this,

besides the fact that it allows you to perform numerical operations quickly,

is it also forms the basis for the Pandas data frame and series data structures.

Thus, to understand how Pandas is actually operating, it's useful to know

about Numpy, because Numpy is how things are typically implemented by Pandas.

0:50

We've seen previously that the Python programming

language provides a rich set of data structures.

This included the list, the tuple, the dictionary and the string.

And you've seen, by now, how these compound or

container data structures can make tasks that might be difficult, much easier.

1:10

Now all of these but the string are heterogenous, which means they can hold

data of different types, so you can combine character data

with numerical data with other containers, all in another container.

This flexibility is powerful but it comes at a cost,

because it's more expensive both in computational time and storage to maintain

this arbitrary collection of data than it is to hold a predefined set of data.

And this is where Numpy differs from the standard Python container data structures.

Numpy will hold an array of data, that's all the same type.

And so, it can make certain assumptions that will allow the computer program to

operate more efficiently and to operate faster.

1:53

So that's what this notebook does.

First, it introduces the NumPy the idea of this n-dimensional array.

In this particular lesson we're going to focus on one-dimensional.

A later lesson will focus on two-dimensional arrays.

But basically we start off talking about what NumPy is, why it's used so much,

it's very fast.

It can be very easy, especially if you already know how to use a list.

And it underlies many of the other common libraries in the standard data

science Python distribution.

2:33

So the first part of this notebook walks through an introduction to these.

And then we use the time it magic to actually create,

in this case a list and apply a function to every element in the list.

And we see how fast they operate.

And that we do the same thing but now with a NumPy array.

And you can see that it's actually quite a bit faster,

in this case five times faster, simply by doing it in NumPy.

We could do other examples and see how the speed goes as well.

3:02

Once we've hopefully convinced you that NumPy is fast and

it's worthwhile learning, we actually need to start doing things with NumPy.

So the first thing is how do you create an array?

And there's a number of different methods that allow you to do this,

there's one that creates an empty array.

There's an array that creates a function that creates an array,

where all the elements are initialized to zero.

Or another one that initalizes to one.

And you could read and see how these all work.

And you should play with this to make sure you're familiar with them.

3:29

We also can slice elements, and we'll see that in just a minute.

But that same notation can be used with the arange method,

that will create a array of data following the specific pattern.

So if we're going to start with zero and end at ten, and stride is one,

we'll go 0 1 2 3 4 5 6 7 8 9.

If we have a stride of two, such as this example shows,

you can see that we go 3 5 7 and 9.

Just like before, we don't actually include the end parameter.

There's also elements or methods that will create arrays whose

elements are linearly spaced so this is very useful for plotting.

If you want to sample data at a specific set of points.

So for instance I need a 100 sample points between 0 and 1, this would do that.

You may want the logarithmically spaced because of the way your

analytics is operating and so you can do the same thing.

But now it's with log space method and

that logarithmically spaces them uniformly.

And this code here just demonstrates that.

Arrays have attributes that provide information about them such as how

many dimensions.

So if you have a one-dimension this value will be 1.

Shape gives you the shape of the array.

So if you have a matrix that holds n rows and m columns it would have shape n,m.

Size is the total numbers of the arrays which is just the product of n times m.

Dtype, is the data type, so is it an integer?

Is it float?

And NumPy will actually allow you to specify that when you create an array,

so that you can say,

look I know my numbers are very small, say they're between 0 and 255.

So I want an unsigned integer and

that will minimize the memory impact of your array.

These can be very important when you start working with very large data and

you want to try to make sure it's fitting within your computer's memory.

5:25

And this is what this section here talks about,

different data types that you might use for an array.

The rest of the notebook talks about different things you might do.

So for instance this is demonstrating that you can't assign a string to a NumPy

array because it must have a floating point value.

5:43

We'll also talk about how to index them including slicing.

Numpy also provides access via a boolean mask array, which is kind of cool.

We can say look let's select all elements where the element is greater then 4.

Then we're going to change that value, so that's what we do.

We say a is 0 1 2 3 4 5 6 7 8 9.

We create a mask array which says which elements in the array are greater

than 4 and then we can change the values in the new array based on that mask.

So this is a pretty powerful way of selecting data and

manipulating data based on some condition.

We can also create random data and this notebook shows that.

We'll use that a lot later on when we talk about probability and statistics.

We can also perform basic math operations,

just like we did with the Pandas library where we operated on a vector fashion,

where the method is applied to every element in the array.

You should definitely try these out so you learn there summary functions.

There's a lot of other things that you could try, including universal functions.

NumPy includes a lot of functions that have been defined to operate on

every element in the array at once.

And this is very nice.

So for instance, this computes the sine of every element in the array.

And that's very nice because it's a simple code.

We didn't have to write a loop to do it.

That's the whole benefit of a vectorized function.

7:04

One other thing I wanted to talk about though is this idea of masked array.

This is really powerful because we can set the array.

We can then create masks and say if this is a bad value we want to mask it

such that when we do operations on it, they'll be ignored.

And this could be very useful when we want to do math on those sorts of things.

So for instance we might look at this and say this is a bad value,

and this is a bad value.

And we want to do some sort of operations on them.

Here we are dividing two arrays and then taking the square root of them, but

since they're masked arrays, it will prevent a error from occurring.

So for instance here, this is a 0, we're dividing by 0.

We can't do that.

And so, instead of giving it an error, it's just telling us a warning.

So this should so you that it's very powerful to use masked arrays.

The last part talks about how to input NumPy data straight into NumPy via load

text and genfromtext.

That's less important for us.

Most of the time we're going to be using Pandas data frames.

But again, you could look through all of this, try everything out.

And as always, if you have any questions let us know in the course forms.

Good luck.