Once you've loaded your data into R, then you might want to do is manipulate

that data, and so you can set it up to be a tidy data set.

Variables in the columns and observations in the rows, and

only the observations that you want to be able to analyze.

So this, first lec, part of this lecture is going to

recover some of the information that you've already seen about subsetting,

that you've probably seen in your R programming class, in

case you don't remember it off the top of your head.

So what I'm going to do, is I'm going to create a data frame here, and I'm going to

create that data frame with three variables that

they're labeled var one, var two, and var three.

And then what want to do is I'm actually going to scramble

up those variables so that they're not in a specific order.

And I'm going to make some of the values be missing.

So I'm going to make some of the values NA.

Once I've done all that, I see that this is the data set that I've created.

So, you can see in any of the

columns that the va, values aren't necessarily in order,

and you can see that for variable two,

there's a couple of missing values with the NAs.

So, the first thing that I can do is I can subset a specific

column by doing x comma one, where I have x open bracket comma one.

And what that'll do is actually open up just the first column of that data frame.

The other thing I can do is if I want to subset

by a column, I can again do x open bracket comma.

And then I can actually use the variable name

before the close bracket to subset just that column.

I can subset by both rows and columns at the same time.

So for example, this command here, x, open

bracket, one colon two, comma var two, will

actually output the first two rows of x, and the first and the second column of x.

So you could subset both on rows and columns at the same time.

So, the other thing is that you can do is you can subset using logical statements.

So, for example, suppose I want to find out all the rows

of x where variable one is less than or equal to three.

And variable three is greater than 11, so to define

those rows I can pass it this logical argument like

that, to subsetting the rows and I end up with

just the rows where both of those conditions are met.

I can also use an or, so I can try to find the

places the rows where, variable one is less than or equal to three or,using

type command here variable three is greater than 15 ,and the result is this

data frame here where one or the other of those two conditions is met.