One thing to keep in mind is, again, when

we apply a prediction algorithm to the test set.

We have to be aware that we can only

use parameters that we estimated in the training set.

In other words, when we apply this same standardization

to the test set, we have to use the

mean from the training set, and the standard deviation

from the training set, to standardize the testing set values.

What does this mean?

It means that when you do the standardization, the

mean will not be exactly zero in the test set.

And the standard deviation will not be exactly one, because

we've standardized by parameters estimated in the training set, but

hopefully they'll be close to those values even though we're

using not the exact values built in the test set.

You can also use the preProcess function to do a lot of standardization for you.

So, the preprocess function is a function that is built into the caret package.

And here I'm passing it all of

the training variables except for one, except for

the 58th in the data set, which is the actual outcome that we care about.

And I'm telling it to center every variable and scale every variable.

That will do that same transformation that we talked about previously to

the data, where you subtract the mean and divide by the standard deviation.

And you can see that by looking at the

mean of the value capitalAve, just like we did before.

And you can see that after using the preProcess function

the mean is zero, and the standard deviation is one.

So, preprocess can be used to perform a lot of the preprocessing

tool, techniques that you, you used to have to do by hand.

The other thing that you can do is you can use the object that's created

using the preprocessing technique to apply that same preprocessing to the test set.

So, here this preObj was the object used on the previous slide.

That was the object that we created by preprocessing the training set.