Welcome to health care data analytics, risk adjustment and predictive modeling. This is lecture A. The component health care data analytics covers the topic of health care data analytics which applies the use of data, statistical and quantitative analysis and explanatory and predictive models to drive decisions and actions in healthcare. The objectives for this unit, risk adjustment and predictive modeling are to, define risk adjustment, predictive modeling and validations of models and health care. Identify the health care and other data needed to perform risk adjustment and predictive modeling. Relate risk adjustment and population segmentation to allocation of health care resources and health care redesign. Discuss uses of risk adjustment and modeling in value-based models of care. Delineate the use of health information technology in the creation, delivery, and evaluation of prediction models. And describe ethical considerations and risk adjustment in population management. In this lecture, we'll examine a scenario that employs data analytics in a health care setting, define some of the terms frequently used when doing risk adjustment and data analytics. Given overview of how to perform risk adjustment and predictive modeling. Describe several types of risk adjustment and identify the data needed to perform risk adjustment and predictive modeling. To understand the purpose of risk adjustment and predictive modeling let's imagine a concrete scenario. In our scenario, Wanda is the Chief Analytics Officer at HealthWest. HealthWest consists of 2 hospitals, 90 clinics, 800 providers and 350,000 patients. Her job is to improve value at HealthWest using its data, information, and knowledge To tackle her job, Wanda employs three main strategies. Number one, improve the effectiveness of and reduce the harm from care, by providing just in time data, information, and knowledge, to decision makers. There's a lot of data in Health WES System, and they could waste a lot of resources analysing data that doesn't have any relevance. Instead, Wanda focuses on decisions that are being made now and what information will affect those decisions. Her second strategy is to improve the allocation of resources by analyzing data. Health West's resources included staff, its supplies and equipment, and its facilities. While together, they can do great things. When allocated poorly, they can be wasteful. Wanda's third main strategy is to add value to care by increasing benefits and reducing costs to the health system. Better application of the same technology can achieve both of these. The current problem Wanda is facing is that the health systems 30 day readmission rate is high. Health West will pay a penalty due to a new Medicare program that doesn't pay for readmissions that occur within 30 days of discharge. She needs to see what decisions are being made and see what analytics could be useful. How can Wanda use data to help with this problem? First, she might want to examine the 30 day readmission rates to see if they've been properly risk adjusted. While their rates are high, it may be due to predictable factors. Next, she could generate a predictive model to identify patients who are at risk for readmission. She knows that health system has in-home monitoring program that could be deployed. But because there is so few patients who would really need it, and be responsive to it, deploying it unnecessarily is wasteful. Instead, she'd like to use the program only for those who are at risk. So far we've highlighted the two main terms of the lecture, risk adjustment and predictive modeling. Let's pause to define them more clearly. The term risk adjustment can be used in many contexts. But when speaking and relationship to health information technology or HIT Risk adjustment refers to how we adjust the level of measured outcomes to account for risk factors of the patient, the environment, and the health care system. Historically, this calculation was performed by actuaries concerned with understanding the expected cost of the person or group of persons based on measurable factors. Currently as health systems become responsible for the costs of patients they see, risk adjustment is being used to facilitate alternative payment models and risk stratification. Alternative payment models refer to when you pay a provider based on something other than just the count of services performed. It may mean you pay a fixed amount or capitated payment regardless of services or it may involve the quality or performance of services. The term predictive modeling refers to predicting an outcome or the likelihood of an outcome based on factors of the patient, the environment and the health care system. One example of this is to identify which patients undergoing obesity surgery are most likely to have complications. Risk-adjustment and predictive modeling shares several key concepts. First, both are concerned with outcomes. The outcomes can be measured levels such as cost or blood pressure, or constructed values such as whether a patient has been readmitted after an initial stay in the hospital. When designing a risk adjustment or predictive model, costs also known as expenditures are measured continuously from zero to any total amount. Other outcomes like readmission, mortality, or complications are events that are simply measured with a yes or no, and are typically coded as a one or a zero in the data. The other concept common to both risk adjustment and predictive modeling are characteristics related to the outcome. These can be called factors, predicting variables or independent variables. Some common examples in healthcare include age, sex and marital status. Additionally, diagnoses such as heart disease and diabetes represent a rich array of information about a person. For health spending, the diagram from Van de Ven in Ellis's book on risk adjustment suggests that most of the variation in healthcare expenditures is random, and can not be explained by systematic features. The chart does not empirically estimate the proportions but lists the different types of factors that could be included in risk adjustment. Such as age and sex, health status, socio-economic, provider, input prices, market power and benefit plan. The last three apply more directly to health spending than the other outcomes do. The basic process of risk adjustment and predictive modeling can be summarized with the following four steps. Step 1, estimate the relationship between the factors and the outcome. If there is only one factor, such as sex, with two values, male and female, the mean outcome level for each value will work. If there is more than one factor a regression model is needed. Step two, predict the outcomes for each observation, this could be based on the mean value or the coefficients from the regression model in step one. The predicted outcome is the primary output of predictive modeling. Step three. If risk adjustment is needed for a group of observations, such as a clinic, then for each group of observations, calculate the ratio of the mean predictive levels from step two to the mean actual levels. The ratios is the risk adjusted index value for the group's outcome. Step 4, multiply the ratio by the mean outcome level. This produces the risk-adjusted level of the outcome for the group. There are three main types of risk adjustment that are worth noting. Retrospective risk adjustment uses factors in the previous period to predict previous period outcomes. Concurrent risk adjustment uses factors up through and into the current period to predict the final level of the current period. In particular, it can include the current level of the outcome of interest though it is not yet completed. Prospective risk adjustment uses factors from the previous period including the outcome if available to predict future period outcome. It's worth noting that some factors may not predict the future very well, despite being good retrospective factors, if the factor is unlikely to persistently affect the outcome. For instance, the diagnosis of a broken leg is likely to affect the retrospective and concurrent risk, but much less so the prospective risk. Most patients will not have ongoing care if a broken leg is able to heal properly. Not all models are the same, and it's worth knowing how they perform. The most common measures used to evaluate risk adjustment and predictive models include the R-squared and Mean Absolute Prediction Error. The R squared is the percentage of the total variation explained by factors in the model. The mean absolute prediction error or MAPE computes the difference between each outcome and its predicted level. Then takes the mean of this error across all observations. Lower numbers are better for the MAPE. To get a sense of how the R-squared relates to the data, the scatter plots of cost and risk score on slide show the R-squared at 3 levels, 0.06, 0.16, and 0.93. As is evident in the third panel, an R squared value of 0.93 means the observed values on the Y-axis are closer to the predicted values. Which are shown on the blue line. Unfortunately for healthcare outcomes, most models are more similar to the first or second panels. And thus the risk score, does not provide much information about what is likely to occur. In predictive modeling it's typical to use the predicted score to categorize patients into different types. For instance, all patients with the risk score above a threshold of 90 on a scale of 0 to 100, may be considered high risk. After this threshold has been established, it's possible to calculate various measures of how the predicted category compares to the actual category. The most common measures are the sensitivity or percent of true positives of those being high risk that were predicted to be high risk. Specificity refers to how well the predictive model rules out negatives or patients who are not high risk. For example, the model could essentially declare all patients to be high risk and the sensitivity would be very high. Though to be useful, it also needs to rule out some patients, thus specificity is calculated as the percentage of true negatives that were predicted as negative. One additional term that's worth knowing about for risk adjustment models, is the C-statistic. This takes a little explaining, which is why you don't see the word C-statistic on the slide yet. As mentioned on the previous slide, when the predicted values are classified, a threshold is used. Since the performance statistics will be different based on that threshold, it's common to calculate a statistic which captures the accuracy, both sensitivity and specificity equally across all possible thresholds. This is displayed as the receiver operating characteristic or ROC curve. It's the area under the curve that's known as the C-statistic, ranging between 0 and 100%. Values above 50 indicate the model is performing better than randomly picking observations to exceed the threshold. Factors and outcomes for estimating or predicting models can be gathered from many sources. Claims data are the records between providers and payers of services performed. These data typically include the diagnosis and basic demographic information, such as age and sex. Enrollment data can provide additional demographic information that is not included in the claim record. Importantly these data will include factors for individuals who have not had claims submitted on their behalf. Finally many organizations have electronic health records which can include detailed clinician notes, lab values and other measurements not included in claims or enrollment data. As indicated previously, the adjustment and predictive models need to be estimated, and for more accuracy, those estimates should be based on a large population. Because organizations may not have access to the data needed to estimate the models. Or they may lack the expertise necessary to estimate the models, they frequently require existing models from vendors. The vendor sells software systems which have the coefficients embedded, sometimes even hidden in the software and then apply them to characteristics of the records at the organization. The organization is then able to obtain scores for each record to use for risk adjustment or predictive modeling. Can you guess why the software sometimes just gives a list of people above a threshold rather than the score itself? If the score was provided, the organization may be able to test different values and figure out the coefficient for each factor, and no longer need to purchase the products. Some of the larger private vendors of risk adjustment models include the Symmetry product from Optum, which calls its scores Episode Risk Groups. The 3M company produces scores called Clinical Risk Groups. Verisk produces risk scores given by the abbreviation, DxCG. Truven has a product, called, the Medical Episode Grouper. Some public entities have also created ways to adjust risk. The Centers for Medicare & Medicaid Services, or CMS, created the hierarchical clinical classifications. Johns Hopkins University created the Adjusted Clinical Groups, or ACG The University of San Diego has maintained the Chronic Disability Payment System, or CDPS, which is targeted at the Medicaid population. There are many other risk adjustment models for sale and use. So which product is best? A 2007 study by Winkleman and Mehmud for the society of actuaries compared various risk adjustment models using the performance statistics we discussed or a squared and mean absolute prediction error. While many versions of the comparison were conducted, it's worth noting that the highest performing risk score, varisk DxCG, using only diagnosis information and age and sex, had an r squared value of 20.6 and 26.5 when previous cost information is included. The CDPS model was the lowest within R-squared of 14.9. The mean absolute prediction error is presented as a percent of the prediction itself. So the errors often are nearly as large as the prediction itself. Lower MAPE's are better and generally follow the R-squared. Returning to Wanda's problem of adjusting her 30 day readmission rate for risk and trying to predict patients at risk of readmission, were left with two general options. She could purchase a product from a vendor, or construct her own. With either approach, she would want to compare her risk adjusted rates to the bench marks against which she is judged and assess the size of the problem. Next, she can use predictive modeling features to identify cases that are responsive to an intervention. The data excercise associated with this unit would provide a similar activity. This concludes lecture a of risk adjustment and predictive modeling. In this lecture, we showed that risk adjustment is the process of adjusting outcomes by patient variables such as age, health status, and other conditions. While predictive modeling attempts to predict the likelihood of an outcome. We discussed how validating the models requires comparing predictions to reality. We discussed R squared or proportion of variation explained, the mean absolute prediction error, area under the curve, and other classification metrics. Finally, data come from many different sources. But most common are healthcare claims, enrollments, and electronic health record data.