0:00

In this lecture, I just want to get everyone on board with writing

Â functions, because functions play a critical

Â role in any R programming and you

Â tend to write a lot of them when you're writing doing a

Â lot of data analysis or doing a lot of kind of statistical analysis.

Â And so I just want to make sure that everyone can kind of get started writing

Â functions and and particularly for those who

Â are less familiar with programming languages in general.

Â So this is just going to be about writing your first function.

Â It's kind of like the hello world so to speak of R.

Â So the first thing you're going to want to do is you

Â going to want to write the function in a text file, all right.

Â It's possible to write functions on the command

Â line in R, but it usually no preferrable.

Â So usually you're going to want to put your functions, in a separate file,

Â separate from any interactive stuff that you're doing in the command line.

Â In the future you'll want to put your functions in

Â an R package, which is a kind of a more

Â structured type of kind of, kind of environment with

Â documentation and everything, but we won't talk about that now.

Â Right now the first thing you're going to want to

Â do is put your functions in a text file.

Â Okay, so the first thing we're going to want to do is open up our studio.

Â So lets do that.

Â 1:11

And so you can see here in R Studio there's some there's

Â some stuff going on here from a previous project that I'm working on.

Â So you, that may happen to you, and generally you

Â can either close it or you can just ignore it.

Â I wanted to create a new R script here, so

Â let's create a clean script here to put our code into.

Â 1:30

So the first function I'm going to write is really simple

Â it's just going to take two numbers and add them together.

Â So this function obviously doesn't have a real point to it but it shows you how

Â to use the function syntax, how to specify the arguments and how to return the value.

Â So the function that adds two values I'm just going to call it add two.

Â 1:47

And and so you get it you use the function directive to start it off.

Â Now it's going to take it's going to add two values so it has to take

Â two arguments so I'm just going call the two arguments x and y and then

Â I'm going to take the two arguments and add them together with the plus operator

Â alright x plus y and then I close out the function with the curly brace.

Â So you can see that I didn't have to

Â do anything special to return the value that that's the

Â sum of the two elements because the or any R

Â function, the, the function returns whatever the last expression was.

Â So here there's only really one expression.

Â So therefore its the last expression and, and it equals the sum of x and y.

Â So here I can, I can highlight this guy and run it

Â in the console, and you can see now I've got my function here.

Â I can say add two, and lets give it say three and five and hopefully I get eight.

Â Yes that's a good sign.

Â So it adds the two numbers together, and that's that.

Â So it's a very simple function, and and,

Â you've now written your first function in R.

Â S the next function that I want to

Â talk about is a little slightly more complicated.

Â It's going to take a vector of numbers, it's going to, it's going to return

Â the subset of the vector, that's, that's above the vector value of ten.

Â So any number that's bigger than ten, it's going to return those numbers for you.

Â 3:09

Just because it gives you any number that's above ten.

Â snd, it's going to take a vector here, we'll call it x,

Â you don't have to call it x, I'm just calling it that.

Â And I like to open and close the curly braces right away, just

Â so you know where the beginning and the end of the function is.

Â 3:25

If you happen to have a lot of code in, you know, in, in a single file.

Â So the first thing I'm going to want to do is I want to construct a logical

Â statement that figures out which elements of

Â this vector x are, are greater than ten.

Â All right?

Â So I'm going to assign an object.

Â I'll call it use because these are, these are the numbers that I'm going to use.

Â And I'll say x greater than ten.

Â All right?

Â So this'll return a logical vector, of trues and falses

Â to indicating which element of x is greater than ten.

Â 3:51

And then I'm going to subset the vector x with this logical vector.

Â So now this function returns, the subset of the vector x that is bigger than ten.

Â Of course if there are no elements of x that are

Â bigger than ten, that it will return an empty numeric vector.

Â 4:07

Now of course, there's really nothing special about the number ten.

Â I just kind of made that up, and so you may want to created a

Â function that allows people to sub, to kind of extract the elements of a vector.

Â That are above an arbitrary other number, right?

Â And so, so it could be ten, it could

Â be 12, it could be five, it could be anything.

Â So maybe you'll want to allow the user to specify that number.

Â So I'll just call, I'll create a new function here.

Â Call above.

Â So it doesn't have the ten encoded in it.

Â I'll use the function directive, and I'll have a

Â second arbitrary called n, which can be any number really.

Â [SOUND] So let's start it off, we'll get the curly braces in there,

Â and now I'll create a logical statement that x is greater than n.

Â Right?

Â And then I'll subset the vector x based on that logical statement.

Â So now if I can source this into R.

Â Oops, and I can run my function here.

Â So I'll just create a vector.

Â Let's say x is one through 20.

Â 5:06

Oh I, you see, so I didn't specify the number n, so it's not going to know

Â what to cut it off at, so I need to specify the threshold, so let's do ah,12.

Â And you can see it returned all the numbers that are greater than 12.

Â So that's kind of as we expected, and so the function appears to be working well.

Â Now let's suppose that maybe there is something

Â special about the number ten, and maybe it's

Â something that people are going to be kind of be

Â doing very often and it's a very common number.

Â So you might you want to specify a default

Â argument so you might want to the default to

Â be ten, so remember when I ran the

Â function before and I didn't specify the number n.

Â It gave me an error or maybe you don't want

Â people to have to encounter that error, and so you'll specify

Â a default value n equals ten so people don't specify

Â the cutoff value n, it will just automatically default to ten.

Â So now I can run this in R and now if I

Â do above, which is x, you see I don't get the error anymore.

Â It automatically gives you all the numbers that are bigger than ten.

Â So it's kind of nice in R when you're writing functions

Â to be able to specify default values like this that make the

Â life of the user just a little bit easier, specially for very

Â common cases, where it's not important that the user specify an argument.

Â 6:13

So those are some very simple functions, in R that can be used

Â to kind of process data or make do simple calculations, like adding two numbers.

Â The next function I want to talk about is, is just going to take

Â a matrix or a dataframe and calculate the mean of each column.

Â Right, so this is slightly more complicated

Â you, you have to take your argument and

Â then you have to loop through each column to calculate the mean of each one, right.

Â So this is going to involve using a for-loop

Â and, and so we'll talk about it here.

Â So let's call this function column mean, because that's what it does,

Â 6:53

And so y is going to be a data frame or a matrix, and we're going to go

Â through the columns of this data frame or

Â matrix and calculate the mean of each column.

Â So the first thing I need to figure out is how

Â many columns does this thing have, and that can be easily done.

Â I'll call it n c for number of columns and we can use the n call function for that.

Â 7:10

That will calculate the number of columns, and, and then I need to

Â initialize a vector that's going to that's going to store the means for each column.

Â The length of this vector has to equal the number of columns, right.

Â So I'll just call it means, and it'll be a numeric vector

Â equal to the length of the number, equal to the number of columns.

Â So this is just an empty vector.

Â It doesn't, it's going to have, it's going to

Â be initialized to, to be all zeros.

Â But we're going to fill it as we go through the column.

Â So now we want to for-loop through the columns.

Â And I'll say i is in and then I'll say one through nc.

Â So this creates a, an integer vector starts

Â a one and ends at the number of columns,

Â and then I'm going to for-loop through and for each

Â I, I'm just going to assign to my means vector.

Â The mean of x bracket I, right.

Â Oh sorry that's called y here now.

Â 8:01

And that's it, and then so for I, I haven't returned

Â anything yet, so right now this function doesn't do anything particularly useful.

Â But what I want to do is return the vector

Â of means and so I'm just going to return that.

Â And that's, since that's the last expression

Â in the function that what will get returned.

Â 8:34

Okay, so I, there are six, I think there are

Â six columns in this dataset, so it gave me six means.

Â Now you can see that the first two columns have NAs.

Â And that's because it, if the, if the vector has

Â an na in it, then you can't calculate the mean.

Â And so the one thing you might want to do, is, by default, is throw

Â out all of the missing values and

Â just calculate the mean amongst the observed values.

Â And so, you'll notice that a lot of functions have a feature where it's like,

Â where they, you can, you can choose whether you want to remove the nas or not.

Â But let me just add up an argument here, it's called [UNKNOWN] na.

Â And it will default to true, right.

Â And then I'll pass this argument to the mean function.

Â So the mean has an na.rm argument, and I'll pass at this value.

Â 9:28

so now the default will be now I get my means

Â for those columns because the default was to remove the na's.

Â I could say false here, and then my na's will come back.

Â So I can always choose to kind of go back to the old behavior if I wanted to.

Â 9:42

So the last thing you want to do any time you're writing a

Â function the most important thing of course is to save your file.

Â So right now this file is unsaved.

Â If you don't save it and R Studio crashes or something

Â happens you'll lose all your work and so you want to go

Â to the save I meant save as menu, and just save

Â your file as, you know, functions or whatever you want to call it.

Â 10:09

So that should get you started, just writing

Â some simple functions in R, for your programming

Â assignment you'll have to write a few functions

Â that kind of go through and look at data.

Â But I just wanted to get you started writing your first

Â functions so that you know kind of how the directive, the function directive

Â works, how the arguments work, and you can play around a little

Â bit with with more complicated ideas as you work through the assignments.

Â