In this video, we discuss how to perform cross validation for classification problems. In particular, we use a two by two table called a confusion matrix to assess classification performance. As for our regression problem, the first step of cross validation is data partitioning, where we randomly split the entire dataset by row into two sets. The training set is used to fit the models. Other set is called the validation set and is used to choose among different models. We use different measures to assess prediction accuracy for regression and classification. For regression, we want errors or residuals to be small in the validation set. For that purpose we use sum of squared errors or equivalently the root mean square validation data. For classification, those measures cannot be used. Instead we look at a confusion matrix which we will explain using example. Let's go back to appointment data, where we are randomly divide the data by row into training and validation set using a 60/40 split. 4,478 rows are in the training set and 2,985 rows are in the validation set. Now to assess and predict accuracy of our model, we make a prediction of whether each appointment will be cancelled in the validation data. Know that it is outcome of the logistic regression model is a predicted probability. How do we make a [INAUDIBLE] prediction using predictive probabilities? One way to set a threshold value t between zero and one. When the predicted probability of a cancellation is above this threshold, then we'll predict cancellation. Otherwise, we'll predict no cancellation. In this manner, we can arrive at a better prediction from our predictive probabilities. This immediately leads to the question, what value should we pick for t? The threshold value is an important parameter we have to choose. Choosing different values of t leads to changes in different kind of prediction errors. You would choose a large t and we rarely predict cancellation. Therefore, we will only detect the cancellation when the chance is very high. This will lead to more errors when we predict no cancellation but the appointment is actually cancelled. We choose a small t when we are predict lots of cancellations. This allows us to detect more appointments that may be cancelled. However, we all make more mistakes where we predict cancellation but the appointment is actually not cancelled. By default, we usually choose a threshold value 0.5 where we predict the more likely outcome. Here is the confusion matrices for two different threshold values. Note that cancellation is denoted by one. The matrix shows the observed or actual class and its predicted class. Actual = 0 means observed status in the data is arrival and actual = 1 means the observed status is cancellation. Similarly, predicted = 0 means we predicted the appointment cannot be cancelled and predicted = 1 means that the appointment is predicted to cancel. When the threshold is 0.5, 2,303 appointments that were not cancelled matched our prediction. However, 17 appointments that were not cancelled were predicted to cancel. 649 appointments that were cancelled were predicted not to cancel, whereas 16 cancel appointments are correctly predicted. Note that very few appointments are predicted to cancel even though there are more than 600 canceled appointments in our validation data. The confusion matrix changes considerably when we change the threshold to 0.3. In particular, a lot more appointments are predicted to cancel. All those appointments that were not canceled, 216 were predicted to cancel, 122 canceled appointments were also correctly predicted. Therefore, with a smaller threshold, we'll correctly predict more canceled appointment. But also makes more mistakes of predicting non-canceled appointments will cancel. This shows the tradeoff of picking the threshold. The confusion matrix essentially shows the possible outcomes when we make better predictions on the validation data. We would like to detect the class with value one. In the example we discuss, we would like to detect appointment cancellations. In this case, we usually cause events that we'd like to detect positives. The complement event in this case, long cancellations or arrivals is called negatives. A correct prediction can be either a true positive or true negative. A canceled appointment that is correctly predicted to cancel is called a true positive. Similarly, a long canceled appointment that is correctly predicted not to cancel is called a true negative. An incorrect prediction can be a false positive or a false negative. A canceled appointment that is incorrectly predicted not to cancel is called a false negative. Similarly, a long cancelled appointment that is incorrectly predicted to cancel is called a false positive. This terminology has its roots in the medical literature but are commonly used in data analytics.