0:03

This video is about the most important function in all R, the str function.

Â This function is really handy.

Â It's really useful. I use it all the time.

Â And you can use it in all kinds of situations

Â just to help you out, to look at R objects.

Â So the idea behind the str function is, is that it's suppose to compactly display the

Â internal structure of an R object, so str,

Â str, you can think of as being, meaning structure.

Â So it's a very

Â simple diagnostic function. It's very versatile.

Â And, the idea's that it's, you can use it as, like, an alternative to summary.

Â You want to look at an object, and see, and see you know, what is it.

Â And, what's in it.

Â You can use summary which will often be very useful.

Â But str is another option.

Â It's partic, particularly well suited for compactly

Â displaying large lists which may contain nested lists.

Â 0:53

And also

Â and, and its goal is to produce roughly one line of output per basic object.

Â For example so if you give it a simple object

Â like a vector, it'll give you one line of output backup.

Â It will print it to the console.

Â And so the basic goal of str is to answer the question, what's in this object?

Â I'm going to start up R here and I'm going to

Â just give a little demonstration of how the str function can work.

Â So here, you can apply str to itself and

Â see it's a function that takes an object. It can take any R object.

Â So, so you can apply str to other functions.

Â So let's say I want to know what the lm function does.

Â So here, what it gives you it gives

Â you the, the function arguments for the lm function.

Â So just, so here you can see it's a very brief summary, you

Â know, take the first argument's a formula, the second argument's data, et cetera.

Â I can look at maybe ls function and it

Â gives me, you know, what are the arguments for the

Â LS function?

Â 1:46

So if you want to look at some data, though.

Â Let's say I'm going to generate some normal random variables here, 100

Â of them, let's say mean two variant, and standard deviation four.

Â 1:55

Now one thing you can do is, is just do summary on x,

Â and that will give you like a five number summary plus the mean.

Â So you get the mean, median that is 25th, 75th percentiles and the min and the max.

Â So that gives you a rough sense of kind of what the range is and how

Â it varies.

Â You can also call str on x and it will give you a little bit more information.

Â So it'll give you a one line output.

Â It tells you that x is a numeric vector. There are 100 elements.

Â And then, and it'll give you the first five numbers in this vector.

Â So you can get a sense of kind of what the data looked like.

Â 2:39

And it'll give me a one line output again.

Â So here, it tells me it's a factor. It's got 40 levels.

Â The level, the first four of them are named 1, 2, 3 and 4.

Â So that's not particularly interesting.

Â And then here, I've said the first couple of elements of

Â this factor are all in the, k-, all have the label, one.

Â You can also call summary on a factor, and you

Â can see that the output's a little bit different.

Â And what this does is it, is it gives you

Â the number of elements in each of the 40 different levels.

Â So that's another piece of data that's not

Â quite as compact of output as str gives you.

Â So you can use str for other types of data types.

Â So here, I can, I can load like a data frame.

Â 3:20

Here's the airquality data set.

Â So, you know, if I look at the airquality data

Â set, I can use the head function to look at the

Â first six rows, or I can call str

Â 3:29

to get a little some different output. So here, it tells me it's a data frame.

Â It tells me that there's a 153

Â observations, so 153 rows in this data frame

Â with, of six variables and then for each variable, it, it gives me a little output.

Â So it tells me that the name of the first variable is Ozone.

Â It's an integer.

Â 3:47

Variable and, and here are the first could of observations.

Â You can see there are some NAs there, so that's useful to know.

Â The second variable is called Solar.R, and

Â it's also an integer, and you can see the, the first couple of values.

Â So, the Str output here is very useful for kind of just getting a quick examination

Â of data that you might have in R and what the structure of different R objects is.

Â 4:18

That will be a 10 by 10 matrix. I'll call str on m.

Â See, it will give me a little bit more information.

Â So now it knows that it's a matrix.

Â It'll say that it's a, it's a two-dimensional array.

Â That it's got 10 rows and 10 columns. And here are the first couple of elements.

Â So that's going to be the first column that you're seeing there.

Â So if I, so if I just print out the first column here,

Â you'll see that it, that's what it's giving me in the str output.

Â The last thing I'll do here

Â is create a little list by using the split function and see how

Â str can look at the list and give a compact summary of it.

Â So, I'm just going to take this air quality

Â data frame And split it by the month.

Â So here I go to airquality, going to split it by the month variable.

Â 5:03

So now if I call str on S you'll see,

Â well there's a little bit of output that flies by.

Â You see now this is a list, that contains five different data

Â frames where each data frame corresponds to the data for a given month.

Â So the months are, the data are only collected over

Â five different months so that's why there's only five elements.

Â So here you can see that the month, the month five, which is May has

Â 31 observations on six variables and that's a

Â little bit of what the data looked like.

Â And you'll see for June,

Â here, there's 30 observations on six

Â variables again, same six variables, of course.

Â And that's what the data look like there.

Â And then for July, the data are here. And August and September.

Â So you can see the you can have a representation

Â of this split list that's kind of, that's not as compact

Â as it was before but it's about as compact as you

Â can make it and str will provide a very nice summary.

Â You can take a quick look at the data.

Â See if there's any problems. See if there's missing values and

Â get a sense of what to do next. So that's the str function.

Â I'll, I'll repeat again, I think it's the most useful function

Â in all of R and you can use it in all cases.

Â I encourage you to use it anytime you have

Â an R object and, you don't know what's there.

Â