A Risk Adjustment Model for Workers’ Compensation Claims

Part 2: Fundamentals of Risk Modeling

Nikolaos Vergos
accordionhealth
6 min readApr 19, 2017

--

In the last post of Accordion’s series on building a Risk Adjustment model for Workers’ Compensation claims we briefly presented some background of Risk Adjustment models in Healthcare, and we outlined the differences between “traditional” Risk Adjustment models for Medicare/Medicaid focus markets, and the fundamentally different universe of Workers’ Compensation. For the next two installments, we will switch gears to focus on our approach toward building a Risk Adjustment model for Workers’ Compensation claims from the ground up.

Workers’ Compensation is Different from Health Insurance

Workers’ Compensation policies in the United States are usually regulated by individual states. Variants are available in all 50 states, but overall Workers’ Compensation is a fairly unique line of insurance business. Furthermore, it is difficult to model compared to other lines: both medical and indemnity (cash payments for lost wages or permanent disability, such as a loss of sight, hearing, finger, etc.) losses are affected by a number of factors, such as the demographic and socioeconomic status of the claimant, their line of work, type of injury, comorbidities, and procedures.

Workers’ Compensation insurance is a highly diverse, highly fragmented industry with no dominant players, a multitude of competitive state funds (such as the State Compensation Insurance Fund in California), and strong regional risk pools. These factors have driven across-the-board interest in Workers’ Compensation, especially in terms of using advanced predictive modeling techniques for pricing, claim management, provider evaluation, detection of fraud, and premium leakage: missing or erroneous underwriting information that undermines the insurers’ rating plans.[1]

Risk Modeling 101: Building our own Smart Algorithm

As we mentioned before, Risk Adjustment boils down to adjusting, or normalizing healthcare-related costs so as to accurately reflect health status and case severity of a given population. These population metrics are quantified with a number, the relative risk score: a numerical representation of a member’s health status (severity of a case) relative to everyone else in the population under consideration[2]. The average risk score of the entire population under consideration is set to be equal to 1, and individual members’ scores can span from 0 to values much higher than 1. The individual scores assigned to each member can also be aggregated by age, gender, geographic area, medical provider, and other dimensions of interest for various analyses, and the extraction of actionable insights. It is worth mentioning that this method is similar to assigning the Experience Modifier that is used in Workers’ Compensation underwriting.

These relative risk scores can generate rich insights both at the individual member level, as well as at aggregate levels of interest. For example, if a medical claim for Workers’ Compensation is assigned a risk score of 0.5, this means that this particular case is half as “severe” as the population average, and therefore it should have also been half as costly for the insurer. Conversely, a medical claim with a risk score of 3.0 is interpreted as three times as severe as the population average, and could reasonably have been three times as costly. Similar logic can be applied to aggregate risk scores, i.e., the average risk score across all claims treated by the same provider, or within the same geographical area.

Example of a treating provider whose average risk score is equal to 0.65 across all claims. Even though billed and paid amounts seem to be considerably lower than the average across all providers, when the risk score is taken into account the provider’s cost measures align with the average values across all providers. This is because this provider generally treats less severe cases than his/her peers, which is reflected in lower billed and paid amounts. Indemnity payments are more complex to risk-adjust because not all Workers’ Compensation claims result in lost work days. Source: Accordion Health Provider Performance and Utilization Evaluator (APPEAR)

How are these risk scores generated, though? Even though the basic methodology behind each model is more or less the same, the “secret sauce” that differentiates the truly intelligent models from their competition lies in creating custom risk-scoring processes and developing top-of-the-line machine learning algorithms to power the core of the predicting process.

All models start by classifying medical diagnostic codes (as codified by the World Health Organization International Classification of Diseases commonly known as the ICD), drug prescription data, and other claim information into clinically reasonable and statistically significant groups. This initial grouping (feature selection) can significantly determine the performance of the model, therefore it is of great importance that the team responsible for developing the model have both domain expertise, and a robust grasp on Healthcare Analytics. Once the data has been preprocessed, customized statistical learning techniques (such as regression) are applied in order to predict a total or subset of targets associated with the healthcare cost at the individual member level: monetary costs (billed and paid amounts) associated with each Workers’ Compensation claim, number of lost days and resulting indemnity payments, procedure utilization details, etc.

Model Evaluation

How do we know whether our model is any good?

A widely accepted metric of model evaluation is the coefficient of determination, colloquially referred to as . It is a number that can be between 0 and 1, and it quantifies how much of the total variation of healthcare costs within a population can accurately be explained by the risk-scoring model. Weaker predictive models have values closer to 0, and stronger models have an improved predictive performance with values closer to 1. A word of caution, however: achieving suspiciously high values of R² could merely mean that our model has done a great job memorizing all information that it has been fed, without any predictive capabilities for future, as-yet unseen claims. This is known as “overfitting”. Such a model fails to discover and describe the underlying relationship between its features (diagnostic/prescription data, claimant’s demographic information) and its target (facets of the healthcare cost at the member/claim level); instead, it describes the “noise”, failing to capture the “signal”, and it overreacts to minor fluctuations of the training data. There are several statistical learning methods to prevent or mitigate the effects of overfitting, such as using test data for model evaluation, cross validation, or adding regularization terms to the objective function in order to penalize for unnecessary, and error-prone, complexity of the model. The reader should rest assured that we at Accordion Health do incorporate all these techniques in order to construct robust models!

Overfitted data: Noisy (roughly linear) data is fitted to both linear and polynomial functions. Although the polynomial function is a perfect fit, the linear version can be expected to generalize better. In other words, if the two functions were used to extrapolate the data beyond the fit data, the linear function would make better predictions. Source: Wikimedia Commons

Predictive Modeling for Workers’ Compensation is a Unique Challenge

Risk models designed to fit a specific population (e.g., Medicare or Medicaid), or a particular geographic area, can be woefully inadequate (or plain wrong) when applied to different markets, or parts of the country. For example, the Centers for Medicare & Medicaid Services (CMS) are developing extremely complex risk models[3] for the Medicare population, mapping health conditions to Hierarchical Condition Categories (HCCs) as we described in the previous post. These models fail when applied to Workers’ Compensation claims, because workplace-related injuries are fundamentally different from the conditions prevalent in the Medicare population (chronic conditions such as diabetes and COPD), and the age spectrum is significantly different (older Medicare population vs. working-age population for Workers’ Compensation claims). These factors affect the model-building process greatly, and do not allow for simple model portability across diverse markets.

Furthermore, risk models need to be trained on a lot of data. They also have to be updated frequently in order to accurately extract the statistical trends of the data while accounting for variance, and to capture longitudinal utilization trends. This recalibration process may be costly and complex, but it is sine qua non for successful learning.

Benefits of Predictive Modeling for Workers’ Compensation

Why bother spending the time to develop a custom risk model for Workers’ Compensation? Such predictive modeling can be used to:

  • Successfully identify and predict medical cost drivers within Workers’ Compensation claims. (i.e., specific diagnoses that result in costlier claims or prolonged absence from work)
  • Triage claims earlier and more effectively.
  • Measure efficacy of treatments through the number of lost work days for each case, and identify treatments that correspond to speedier resolution.
  • Evaluate providers’ ‘performance’ and cost utilization across the patient populations they treat in a fair, risk-adjusted way.
  • Identifying which provider would be more effective at treating a specific injured worker, given the age, gender, and diagnoses.
  • Used to more accurately set reserves for an individual claim.

This process is not without challenges due to the unique nature of Workers’ Compensation. In the next post of this series, we will focus on our case study of building a longitudinal Risk Adjustment predictive model for a diverse population of Workers’ Compensation beneficiaries. Stay tuned!

For more information, please contact info@accordionhealth.com

REFERENCES

[1] Wu, Peter: “Predictive Modeling for Workers Compensation”, CAS Predictive Modeling Seminar, San Diego, CA, October 2008

[2] Milliman: “Workers’ compensation claims predictive modeling”

[3] CMS: “Risk Adjustment Methodology Overview”

--

--

Nikolaos Vergos
accordionhealth

Physics Ph.D. — Associate Director, Data Science @ Evolent Health