0:00

[BLANK_AUDIO]

Â The apply function is another loop function that's used to

Â evaluate a, a function over the margins of an array.

Â Usu, usually, the function's going to be an

Â anonymous one, like we showed with lapply

Â or it could be a function that already exists like the mean, for example.

Â It's usually used to apply a function to the rows or columns of a matrix.

Â Of course, matrices which are two dimensional arrays, are going to be

Â the most common type of array that we're going to use in R.

Â But you may have three dimensional arrays and such.

Â But you, so you can use apply on general arrays such

Â as taking the average of an array of matrices, for example.

Â One thing to note, and you may hear this

Â out in the wild, occasionally, that apply, using apply

Â is somehow better than ta, using a for loop

Â or somehow it's faster than using a for loop.

Â And that's, generally speaking, not true.

Â It was true a long time ago in older versions of

Â the S language in R but right now, it's not true at all.

Â The main reason you want to use a function

Â like apply is that it involves less typing.

Â And less, less typing is always better, because good programmers are always lazy.

Â So, apply is very useful, but in particular on

Â command line, because on the command line, when we're

Â interacting with data, we're doing exploratory analysis, we want to

Â do as little typing as possible because it just makes

Â our fingers tired. So, how does apply work?

Â So the first argument acts as an array.

Â An array is a vector that has dimensions attached to it.

Â So a matrix is a two dimensional array, for example.

Â 1:34

A margin, which, which we'll get to in a second.

Â This is a vector, an integer vector that indicates which margin should be retained.

Â And the last important argument is the function that you want to apply to each

Â of the margins.

Â So, and then the dot dot dot argument are other arguments that

Â you want to pass, include other arguments that you want to pass to the function.

Â So here's a matrix that I'm creating, it has 20 rows and ten columns.

Â so, in, in, in the matrix it's just normal random variables that I've generated.

Â So when I apply, so what I want to do is, I want to take

Â this matrix and I want to calculate the mean of each column of the matrix.

Â So the way I can do this

Â is I can apply, use the apply function on x.

Â I give it the margin, two, and I'll say what that means in a second.

Â And I pass the function, mean.

Â And so what happens is, I get back a vector of length

Â ten that has the mean of each of the columns of the matrix.

Â And so the idea is that, so the matrix has ten, sorry, it has 20

Â rows and ten columns, and so that you can think of the matrix as, as, and so

Â dimension one has 20 rows and dimension two has ten columns.

Â So, when you apply the function, mean, over the

Â matrix, well, the idea is that you want to keep the

Â second dimension, which is the number of columns, and

Â you want to collapse the first dimension, which is the rows.

Â So that, so the idea is that you're taking the

Â mean across all the rows in each column, and then

Â you're, and you're essentially limiting the, the rows from the

Â array, so what you get back is actually has the,

Â has one of the dimensions has been eliminated.

Â It's really the first dimension that's been eliminated.

Â And so you get this number which this vector which

Â has each of the means for each of the columns.

Â similarly, you can take the means of all the rows of the array.

Â And I can, I can call the apply function on x.

Â I give it the dimens, the margin, one, which

Â means preserve all the rows, but collapse all the columns.

Â And then I, I, here I'm calculating the sum

Â of each the rows, instead of the mean.

Â So the, so, I cast the one because it says I want to, I, because

Â of what I mean is I want to preserve the rows and collapse the columns.

Â So here, I got a vector of 20, because there's 20 rows.

Â And each, and inside each and for each row, I calculate the sum of that row.

Â 3:47

Now for, for simple operations, like calculating the sum,

Â or calculating the mean of a column or a

Â ma, or, or, of a row there are special

Â functions that are highly optimized to do this very quickly.

Â So for calculating the row sums and

Â row means, there's the functions rowSums and rowMeans.

Â And similarly, there's colSums and colMeans, which

Â do the same things for the columns.

Â These are equivalent to using the apply function,

Â but they're very much faster than using the apply,

Â because they're optimized to specifically to do those operations.

Â So if you want to calculate the sum or the mean of

Â a, of a column or row of a matrix, use those functions instead.

Â 4:28

Now you can, you know, use the

Â apply function to apply other types of functions.

Â For example, suppose you have a matrix.

Â Here, I've generated, again, a matrix of random

Â normal variables, that's 20 rows by 10 columns.

Â And suppose I want to go through each row of the

Â matrix and calculate the twenty-fifth, and the seventy-fifth percentile of that row.

Â So, I can apply on x I, I get, I pass the

Â margin of one, because it means I want to preserve the rows.

Â And then I'm going to pass it the quantile function.

Â Now the quantile function needs, needs, the quantiles that you want to calculate.

Â So there's no default value for that, so I actually have to pass

Â it to the quantile function through the dot dot dot argument of apply.

Â So here, the argument for quarntile is called probs.

Â And I, and I give it 0.25 and 0.75 meaning

Â I want to calculate the

Â twenty-fifth percentile and the seventy-fifth percentile.

Â So what this funct, what this call does is, it

Â goes through each row of the matrix, and for each

Â row, it calculates the twenty-fifth and seventy-fifth percentile.

Â So there's, so for each row, there's going to be two numbers that are returned.

Â And what apply will do is, it'll create a matrix that has two rows, and the number

Â of columns is equal to the number of rows in this matrix, which happens to be 20.

Â So here, I'm going to get a 2 by 20 matrix, where in each

Â column of this return matrix, I've got

Â the twenty-fifth and the seventh, seventy-fifth percentile

Â for the corresponding row.

Â So, for example, in the first row, the twenty-fifth percentile

Â is minus 0.33 and the seventy-fifth percentile is mi, is

Â 0.92 and in the sixteenth row the seventy-fifth, the twenty-fifth

Â percentile is minus 0.95 and the seventy-fifth percentile is 0.88.

Â So you see how that works.

Â 6:07

Now, suppose I had more than just a matrix.

Â Suppose I, I had an array that I want to do something with.

Â So, the so, here, I'm creating an array with, which has normal random variables

Â and it has two rows and two columns and it's ten and the third dimension is ten.

Â I guess I'm not sure what you would call that dimension.

Â But you can imagine this.

Â You can think of this as being, the, they're a

Â bunch of 2 by 2 matrices that are kind of stacked together.

Â 6:33

And the idea is that, you can imagine I have a be, a bunch of 2

Â by 2 matrices, and I want to take the average of those 2 by 2 matrices.

Â So, the average of the, of a bunch of 2 by 2

Â matrices is going to be another 2 by 2 matrix, which is the mean.

Â And so, I can call apply on this array...

Â And I want to keep the first, and the

Â second dimension, but I want to collapse the third dimension.

Â So here, when I, when I give the margin, I give it one

Â and two, which I want to preserve, and then three is not there,

Â which means I want to collapse third dimension.

Â So here, and then the function I pass it is the mean.

Â So, what this will do is, it'll take my array and it'll

Â collapse, it'll average over the third dimension and give me the mean matrix.

Â Another way that you can do this is to use the rowMeans function, so

Â even though this is in a matrix, you can apply rowMeans to an array.

Â And you give it, and you pass the argument, dims, equal to two.

Â