A Risk Adjustment Model for Workers’ Compensation Claims

Part 3: Building Accordion’s Predictive Model

Nikolaos Vergos
accordionhealth
9 min readApr 25, 2017

--

After laying the groundwork for the fundamentals of Workers’ Compensation in the United States, as well as a whirlwind overview of Risk Modeling, it is now time to present our Risk Scoring model! It’s important to point out that, even though our case study focuses on Workers’ Compensation claims, the same methodology can be applied to creative predictive models across a variety of insurance markets.

Data source

The data source we use for developing our risk model is the “Political Subdivision — Workers’ Compensation Alliance”. (“Alliance”, in short) Alliance was formed in 2006 by five risk pools that offer Workers’ Compensation coverage and claims services to public entities in Texas. Collectively, Alliance risk pools represent the second largest Workers’ Compensation carrier in the state. They provide Workers’ Compensation benefits to more than 3,000 public employers representing 500,000 employees. The risk pools represent diverse workplaces such as cities, counties, and other units of local government, schools, community colleges, water districts and authorities, community centers, etc.

For our initial model, the Alliance provided us with 3 years’ (2013–2015) worth of claims data for ~62,000 injured workers. This dataset contains the entire history of each claim, including all visits with medical care professionals, referrals to specialists, diagnostic and procedure codes associated with each encounter, billed and paid monetary amounts associated with each treatment, as well as indemnity payments and lost work days for each case.

Feature selection

As in every endeavor, we put significant effort into preprocessing and feature selection, i.e., restructuring the raw data into a coherent data set to be subsequently used for applying statistical learning techniques. Since our overarching goals are to develop versatile, portable risk scoring models and to broadly assess the financial risk and case severity associated with each case, we decided to distill basic information from each claim. This information includes demographic data, such as each claimant’s age and gender, and all diagnostic codes associated with each claim. Let us assume, as an example, that we are focusing on an individual knee injury case. Even though along the progression of the case the injured worker visits several medical practitioners, each one of whom diagnoses the worker with “knee injury”, we have decided to focus on the “existence” of the conditions, rather than how many visits the worker has made. Therefore, we converted each claim’s diagnostic history into binary features, with a value of 0 for the absence of a specific diagnosis, and a value of 1 for its presence, regardless of the length of each case.

Recall that diagnoses are codified using the ICD, a medical classification list by the World Health Organization. The United States adopted the newest standard for clinical use, ICD-10, on October 1st, 2015. Diagnostic codes billed before that date use the ICD-9 classification. Although most of our claims data predate 10/01/2015, we decided to convert all diagnostic codes into the ICD-10 classification standard, in order to make our process future-proof: our model is being updated with newer data after 2015, so it is of great importance that diagnostic codes are represented in a consistent way. ICD-10 diagnostic codes follow a structure scheme that starts with general information about the diagnosis, and continues with more detailed descriptions. For example, “M16.11” corresponds to “Unilateral primary osteoarthritis, right hip”, whereas “M16.12” corresponds to “Unilateral primary osteoarthritis, left hip”. It is easy to deduce that “M16” is all we need if we seek to capture a general “unilateral primary osteoarthritis” diagnosis. ICD-10 codes can go into much greater detail, but the first three characters of an ICD-10 code always designate the category of the diagnosis. In the interest of clarity, brevity, and generalizability, we decided to focus on these three-digit ICD-10 diagnostic categories as features for our risk models. How many diagnostic categories should we use, though?

Top ICD-10 diagnostic codes (denoted by the prefix X-) of our dataset. Notice the great detail after the first three “category” digits of each diagnostic code. Source: Accordion Health

Exploring the diagnostic codes present within the data set, we found that there are more than 1,100 distinct ICD-10 categories present. Even after we merged similar diagnoses into their parent 3-digit ICD-10 category, using all categories as equally important predictive features for our model would make it unnecessarily complex, and it would be a surefire recipe for overfitting. As a matter of fact, we scored our Machine Learning algorithms with increasing numbers of ICD-10 categories included into the feature pool, and we ended up creating very high-scoring models (R² ~ 0.9) when the majority of the ICD-10 categories were used. However, as we mentioned in the previous post, this probably just means that our algorithms did a very “good job” memorizing the dataset details, rather than developing strong predictive capabilities by discerning the signal from the noise.

Top 20 ICD-10 categories in our dataset. Source: Accordion Health

Since our goal is a clean, generalizable model that successfully identifies and predicts medical cost drivers within Workers’ Compensation claims, we decided to use the 130 most frequent ICD-10 categories as our model’s binary diagnostic features. This way, we still capture the majority of the most frequent diagnoses (the 130th one, “T73”, still appears more than 1,000 times within the dataset) without forcing our model to learn connections between less frequent diagnoses and claim cost.

Let us point out that mapping entire sequences of diagnostic codes into categories indicating broader conditions is not a novel idea in Risk Adjustment; the Centers for Medicare and Medicaid (CMS) have developed quite sophisticated methodologies to group similar diagnoses into broad categories, such as “cardiovascular disease”. We are porting a similar idea into the Workers’ Compensation domain.

We developed individual models to predict each one of four different cost measures associated with each claim: billed amount, paid amount, indemnity payments, and cumulative lost days. Since these targets are numerical, the simplest desired approach to model the relationship between the targets and the features has to be a flavor of linear regression.

Model

Equation for Linear Regression. Source: Wikipedia

The equation above illustrates the general framework for a regression-based perspective on risk scoring. Let y be the target for the i-th claim, x be the vector of p features (age, gender, and the 130 ICD-10 diagnostic categories, thus, in our case, p=132), β be the vector of the p+1 unknown regression coefficients, and ε be an unobserved random variable that adds noise to the linear relationship between the dependent variable (target) and regressors (features). The index i runs from 1 to n, where n corresponds to the number of claims in our dataset (~62,000). The first element of the β vector is called the intercept, and it corresponds to a baseline value for y in the absence of all features; it is prudent to require that the intercept be equal to 0 in our case.

Fitting a simple linear regression equation in our data, and solving for the vector of coefficients, should be enough to generate a simple, yet strong predictive model for each one of the targets; however, model interpretability needs to be taken into account. Let us assume, for example, that we yield a negative coefficient (value within the β vector) associated with a specific feature (diagnosis) x. Let our target y be the paid amount for the claim. Since we have chosen binary (0/1) features to denote the absence or presence of a specific ICD-10 diagnostic category, a negative value for this diagnostic category’s coefficient means that, in the presence of this diagnosis (x=1), the paid amount for this claim (y) is decreased by the value of the coefficient. Even though clinical expertise might instruct otherwise for specific cases, it is safe to assume that the presence of additional diagnoses, if correlated with the claim’s cost, has to lead to an increase, not a decrease of the claim’s cost. Therefore, a second requirement for our model’s coefficients is that they are all non-negative.

Thankfully, our quiver of tricks includes a more advanced statistical learning algorithm that can satisfy both requirements of our problem: the lasso regularized regression (least absolute shrinkage and selection operator).[1] Without getting into too much technical detail, the lasso performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. It allows us to force zero-intercept and nonnegative coefficients in our linear regression analysis.

From targets to risk scores

After picking our regressor, we are ready to create our individual models that predict each claim’s cost targets: billed/paid amount, indemnity payments, cumulative lost days. With this information, how are we going to obtain the relative risk score of each claim, the powerful metric that will allow us to perform Risk Adjustment?

The basic idea behind actuarial risk score models is the transformation of the expenditure variable (our target in each case) into a risk score by rescaling the expenditure to a mean value of 1. After we fit our model and yield the β vector of coefficients, we can calculate the predicted targets for each one of the claims. As a matter of fact, the basic method for model evaluation we discussed in the previous post, , can be directly calculated through the deviations of the predicted target values from the actual ones, across the data we used to train the model. Dividing each case’s target by the mean predicted value from our model rescales the value around 1.

Risk scores with this design are easy to interpret, as a score of 1.0 is equivalent to a person whose healthcare costs (and, subsequently, the severity of their case) are exactly equal to the mean value across the entire population. As in linear regression model estimation, the noise factor ε is assumed to be approximately normally distributed. The predicted risk score for each claim is calculated by multiplying the vector of fitted regression coefficients associated with each one of the independent variables (model features).

A log-log scatterplot of the actual paid amount vs. the model’s predicted paid amount for each one of the 62,000 claims of our dataset. The model’s R² value is 0.519. Claims with red hue (above the diagonal) have higher predicted values for the paid amount than their true values. The converse is true for claims with green hue, under the diagonal. Source: Accordion Health

The predicted risk scores can then be aggregated by geographical area, or treating provider, and their averages can be used to risk-adjust healthcare expenditures to accurately account for case severity. As we saw in the previous post, Risk Adjustment can shed a whole new light on evaluating healthcare expenditures across all claims treated by the same provider, when case severity is taken into account. Initial appearances can be deceiving.

Risk-Adjusted expenditures across all shoulder injuries treated by a certain provider with an average risk score of 3.32 Source: Accordion Health Provider Performance and Utilization Evaluator (APPEAR)

Let us focus on the bar chart on the left: even though this provider initially appears to be costlier than the population average, when risk-adjusted to include case severity (claims with many diagnoses), the provider’s costs drop dramatically. This means that this provider is actually rather efficient in treating complex, costly cases!

Future-proofing our Models

This relatively straightforward analysis of a small dataset of Workers’ Compensation claims has given us a renewed view into the Workers’ Compensation domain, through Risk Scoring and Adjusting healthcare expenditures for the severity or case mix of a population. We evaluate the performance of individual providers by taking into account the relative risk of the populations they are treating, both as a whole, and in injury-specific cohorts. Predictive modeling has allowed us to accurately identify the diagnostic categories that drive the costs of healthcare. For example: ICD-10 category S31 has a coefficient of 3,575 in our model that predicts paid amount. This means that the presence of this diagnostic category within a claim bumps the predicted paid amount for this claim by $3,575! (For the record, S31 corresponds to “Open wound of abdomen, lower back, pelvis and external genitals”.)

Both our in-house developed predictive engine and the results we obtain from our analyses are not static. As we obtain more data, the predictive capability of our models is improving. We can develop more complex analytics that capture healthcare utilization trajectories, and efficacy of treatments with respect to costs and lost work days. Richer models including pharmacy claims (NDC codes) and other potential explanatory variables can be developed, without sacrificing the simplicity of the basic ideas, or the portability of our Risk Adjustment toolkit to diverse markets. We at Accordion are excited to develop these predictive engines, and use top-of-the-line Machine Learning algorithms in order to facilitate the transition to Value-Based Care!

For more information, please contact info@accordionhealth.com

REFERENCES

[1] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58, №1, pages 267–288)

--

--

Nikolaos Vergos
accordionhealth

Physics Ph.D. — Associate Director, Data Science @ Evolent Health