And then what I can do is actually create some covariants by

running the SVA function on this data set.

So I pasted the expression data set, the model for including the variable I care

about, the model not including the variable I care about.

And then I tell it how many surrogate variables, or

how many surrogate batch effects it should estimate.

So then it's going to go through and

estimate those batch effects through an iterative procedure.

So now what I can do is I can actually see that the SVA object that's returned has

this SV component, which is basically new co-variants that have been created.

That are the potential batch effects, and so for example, I can plot,

or I can correlate those batch effects with the actual observed

batch variable, and I can see that our estimated surrogate variable is,

this second one is super highly correlated with the batch file.

The first one isn't necessarily as correlated with the batch variable

as the first one.

So then I can make a plot of that second surrogate variable

versus the batch variable.

And I can see that there's like a relationship between the inferred

surrogate variable, the inferred batch effect, and the observed batch effect.

And so then I can actually plot these

data points on top of that to see if that's true.

So what we've done here with the SVA is not necessarily actually cleaning

the data set.

We've just identified new covariates that we now need to include in our model fit.

So we can combine them with our original model that had the cancer term in it.

And so we end up with a model fit that includes both

our estimated surrogate variables as well as our cancer status variables.

And so then I can use lm.fit or any of the other methods we talked about in

the previous lecture to get the model fit after you adjust for those variables.

So now what I can do is I can see what the model fits look like

comparing SVA to combat for example.

Here, I'm plotting SVA versus combat.