Then you predict on new samples by using the predict command.

Again, it's a unified framework, so we just type predict.

We pass it the modelFit that we got from the train, function in

carrot, and we pass it which data we would like it to predict on.

So in this case, the new data is the testing data.

When you do that, it will give you

a set of predictions that correspond to the responses,

and you can use those to try to evaluate

whether your model fit works very well or not.

One way that you can do that is by calculating the confusion matrix,

so that's using this confusion matrix function, and so note the capital M here.

Don't miss that when you're typing confusion matrix.

Then you pass in the predictions that you got from your model fit.

And then the actual outcome on the testing samples.

So in this case, it was the type or whether it was spam or ham message.

And then it will record the confusion matrix.

So it'll tell you a table for which of the cases that you predicted to be nonspam or

actually nonspam, which is the cases where it was

spam, and you predicted to be spam and so forth.

And then it gives you a bunch of summary statistics.

So for example, the accuracy, a 95 percent confidence interval for the accuracy,

and then a bunch of information about how well they correspond in other categories.

So, for example, the sensitivity and the specificity of that.

So the confusion matrix function wraps a bunch of different accuracy measures

that you might want to get out when you're evalutating the model fit.

For a lot more information about caret, we're going to cover a lot of it

in this class in terms of how do you actually apply the caret package.

But I found that these tutorials are actually very nice, and they can

be very useful for covering material that we don't cover in this class.

And there's also a very nice paper in the journal of

statistical software that introduces the caret

package if you want further information.