0:00

Functions represent some of the most powerful aspects of the R language.

Â And they really represent the transition of the user

Â of R into the kind of programmer of R.

Â And the basic idea is that you can type the command

Â line and kind of explore some data, and run some code.

Â But eventually you'll probably get to the point where

Â you need to do something a little bit more complex.

Â A little bit more than, than can be expressed

Â in a single line or maybe in two lines.

Â And if you have to do this over and over again, then you're

Â usually going to want to encode this kind of functionality in a function.

Â I'm going to talk about functions in three parts here.

Â First I'll talk just about the basics of how

Â to write functions and how they are written, in R.

Â Then I'm going to talk a little bit about lexical

Â scoping and the scoping rules, in, for the R language.

Â And then last, I'm going to end with a little example.

Â 0:49

So, functions in R are created using the function directive

Â and functions are stored as R objects just like anything else.

Â So you might have a vector of integers a list of

Â different things, a data frame, and then you have a function.

Â So, in particular, R objects, R functions are

Â R objects that are of the class function, okay?

Â So, the basic instruction here is that you assign

Â to some object, here I call it F, the,

Â the function directive, which will take some

Â arguments, and then inside the curly braces

Â there is R, there is R code, which does something that the function does.

Â So one nice thing about R is that functions

Â are con, considered what are called first class objects.

Â So you can treat a function just like you can treat pretty much any other R object.

Â So importantly, this means that you can

Â pass functions as arguments to other functions.

Â This is actually

Â ver, a very useful feature in statistics. And also functions can be nested.

Â So you can define a function inside of another function, and we'll

Â see what the implications of this are we talk about lexical scoping.

Â So the return value of a function is simply the

Â very last R expression in the function value to be evaluated.

Â so, there's no special expression for returning something for a function.

Â Although, there is a function called Return.

Â Which we'll talk about in a second.

Â So functions have what are called named arguments.

Â And the named arguments can potentially have default values.

Â So, a lot of these features are useful for when

Â you're designing functions that, that may be used by other people.

Â For example, you may have a function that had a lot

Â of different arguments so you can tweak a lot of different things.

Â But most of the time, you don't have to change all those different arguments.

Â You may only care about one or two.

Â So it's useful for some of the arguments to have default values.

Â 2:44

The formal's function actually will, takes a function as an input

Â and returns a list of all the formal arguments of a function.

Â So not every function call in R makes use of all the formal arguments.

Â So for example, if a, if a function has ten different arguments you may

Â not, you may not have to specify a value for all ten of those arguments.

Â So function arguments can be missing or they

Â may have default values that are used when they are not specified by the users.

Â So R function arguments can be matched positionally or by name.

Â So when, this is very, this is key when

Â you're writing a function and also when you're calling it.

Â So for example, take a look at the function sd, which calculates the standard

Â deviation of, of, of a set of numbers. So sd takes a input x, which is the name

Â of the argument and which is going to be a vector of data.

Â And there's a second argument called na.rm and this controls whether

Â the missing values in the data should be removed or not.

Â And the default value is for na.rm to be equal to false.

Â So by default if you have missing data in your, in the, in the set of

Â numbers for which you want to calculate the

Â standard deviation the missing values will not be included.

Â So, here I'm

Â simulating some data and I'm just simulating a hundred

Â normal random variables, and there's no missing data here.

Â So, if I just calculate sd on the vector

Â it'll give me an estimate of the standard deviation.

Â If I say X equals my data that's the same thing.

Â So here I've named the argument but I haven't but otherwise

Â the data are the same so it'll calculate the standard deviation.

Â In the first example I didn't

Â name the argument.

Â So it defaulted to passing mydata to be the first argument of the function.

Â 4:29

So in the next example here, I'm going to name both arguments.

Â I'm going to say X equals mydata, and na.rm equals false.

Â That calculates the same thing as before.

Â Now when I name the arguments, I don't have to put them in any special order.

Â So for example, I could reverse the order of the argument here.

Â Say na.rm is equals false first, and then say x

Â equals mydata second, and that will produce exactly the same

Â results because I've named the arguments.

Â Now, what happens if I name one argument and don't name the other?

Â 4:56

Well what happens is that the named argument is set, and

Â you can figure it as being removed from the argument list, and

Â then any other, any other things that are past will be matched

Â to the function arguments in the order in which they, they come.

Â So for example, SD after you remove the na.rm

Â argument only has one more argument left and so mydata

Â would be assigned to that argument.

Â So all these expressions return the same exact value.

Â So although it's generally, all these expressions are

Â equivalent, I don't say recommend all of them equally.

Â So for example, I don't necessarily recommend reversing the order of the

Â arguments just because you can even though if you name them, it's appropriate.

Â so, just, just because that can lead to some confusion.

Â 5:50

And so for example the lm function here which

Â fits linear models to data has this argument list here.

Â So the first is the formula, the second is

Â the data And then subset, the weights et cetera.

Â And you see that the first five arguments here don't have any default value.

Â So, the user has to specify them.

Â So the but then the method, the model, the X argument, they all have

Â default values so if you don't specify

Â them they will use those values by default.

Â And so the following two function calls are equivalent.

Â I could have specified the data first and then the formula and then the model.

Â And then, and then, and then the subset arguments

Â or I could specify the formula first, the data second,

Â the subset and then say model is equal to false.

Â Now the reason why the first one is okay is

Â because I, so I matched the data argument by name.

Â You can imagine that that's kind of taken out of the argument

Â list now, then Y till the X doesn't, isn't specified by name.

Â So it's given to the first argument that hasn't already been matched.

Â And I, in which case that's the formula.

Â Model equal to false, so that's been matched by name so

Â I can kind of get rid of that from the argument list.

Â And then 1 through 100 has to be assigned

Â to the argument that has not yet already been matched.

Â So in this case formula was already matched, data was already matched.

Â And so the next one is subset.

Â So 1 to 100 get's assigned to the subset argument.

Â So this is somewhat a confusing way to call lm,

Â and I don't recommend that you do it this way.

Â But, I, I wrote it this way just to demonstrate

Â how positional matching, and matching by name can work together.

Â A common usage for lm though is the second

Â version here. Which say lm Y til the X.

Â So there is a formula there.

Â And then the next one is mydata, which the

Â data set which you're going to grab the data from.

Â The subset argument and then, so the first three arguments,

Â you know, are commonly specified, every time you call lm.

Â But then, the rest you may or may not specify and so

Â you may, if you just want to specify one of the following arguments.

Â It's easier just to call it out by name.

Â 7:49

so, most of the time, the named arguments are useful in the command line.

Â When you have a long argument list and you want to use the defaults for everything

Â except for one of the arguments, which may be in the middle or near the end

Â of the list, and you can't usually, you

Â know, you can't remember exactly which argument it

Â is, whether it's the fourth, or the sixth,

Â or the tenth argument on the argument list.

Â And so you just call it by name, and that way

Â you don't have to remember the order of the arguments on

Â the argument list.

Â Another example where this comes in handy is for plotting, because

Â mo, many of the plot functions have very long argument lists.

Â 8:33

So function arguments can, can also be partially matched

Â which is used, mostly useful primarily for interactive work,

Â not so much for programming.

Â But when you call a function, if the argument has a very long name

Â you can match it partially so you can type part of the argument name

Â and as long as there's a unique match there then it will, the R

Â system will match the argument and assign the value to, to, to the correct one.

Â So the, the, the order of the operations that

Â R uses, first it'll check for an exact match.

Â So if you name an argument

Â it'll check, check to see if there's

Â an argument that, that exactly matches that name.

Â If there's no exact match it'll look for a partial match.

Â And then if that doesn't work, it'll look for a positional match.

Â