Welcome to Data Science Methodology 101 From Modeling to Evaluation - Evaluation!

A model evaluation goes hand-in-hand with model building as such, the modeling and

evaluation stages are done iteratively.

Model evaluation is performed during model development and before the model is deployed.

Evaluation allows the quality of the model to be assessed but it's also an opportunity

to see if it meets the initial request.

Evaluation answers the question: Does the model used really answer the initial question

or does it need to be adjusted?

Model evaluation can have two main phases.

The first is the diagnostic measures phase, which is used to ensure the model is working

as intended.

If the model is a predictive model, a decision tree can be used to evaluate if the answer

the model can output, is aligned to the initial design.

It can be used to see where there are areas that require adjustments.

If the model is a descriptive model, one in which relationships are being assessed, then

a testing set with known outcomes can be applied, and the model can be refined as needed.

The second phase of evaluation that may be used is statistical significance testing.

This type of evaluation can be applied to the model to ensure that the data is being

properly handled and interpreted within the model.

This is designed to avoid unnecessary second guessing when the answer is revealed.

So now, let's go back to our case study so that we can apply the "Evaluation" component

within the data science methodology.

Let's look at one way to find the optimal model through a diagnostic measure based on

tuning one of the parameters in model building.

Specifically we'll see how to tune the relative cost of misclassifying yes and no outcomes.

As shown in this table, four models were built with four different relative misclassification

costs.

As we see, each value of this model-building parameter increases the true-positive rate,

or sensitivity, of the accuracy in predicting yes, at the expense of lower accuracy in predicting

no, that is, an increasing false-positive rate.

The question then becomes, which model is best based on tuning this parameter?

For budgetary reasons, the risk-reducing intervention could not be applied to most or all congestive

heart failure patients, many of whom would not have been readmitted anyway.

On the other hand, the intervention would not be as effective in improving patient care

as it should be, with not enough high-risk congestive heart failure patients targeted.

So, how do we determine which model was optimal?

As you can see on this slide, the optimal model is the one giving the maximum separation

between the blue ROC curve relative to the red base line.

We can see that model 3, with a relative misclassification cost of 4-to-1, is the best of the 4 models.

And just in case you were wondering, ROC stands for receiver operating characteristic curve,

which was first developed during World War II to detect enemy aircraft on radar.

It has since been used in many other fields as well.

Today it is commonly used in machine learning and data mining.

The ROC curve is a useful diagnostic tool in determining the optimal classification

model.

This curve quantifies how well a binary classification model performs, declassifying the yes and

no outcomes when some discrimination criterion is varied.

In this case, the criterion is a relative misclassification cost.

By plotting the true-positive rate against the false-positive rate for different values

of the relative misclassification cost, the ROC curve helped in selecting the optimal

model.

This ends the Evaluation section of this course.

Thanks for watching!