Does insurance have a fairness problem? The use of credit scores to set auto insurance premiums is a prime example of how non-driving factors push up costs for those who can least afford them.
The roll-out of machine learning to insurance, in theory, represents a chance to reset and create a more transparent system. That’s the mission of Cover’s CTO & Co-Founder, Anand Dhillon.
The challenge is that machine learning, like most new innovations, turns out to be a double-edged sword. What if algorithms don’t eradicate bias, but instead create new types of discrimination?
In an industry predicated on segmenting people into groups based on perceived risk, what is the end goal for insurtech engineers? Is it even an issue that developers can realistically solve?
Answering questions like these are one of the biggest technical — and ethical — challenges in the insurtech space.
We chatted with Anand to get his take.
COVER: First up: How is machine learning changing the way insurance companies do business?
AD: Pricing in insurance is all about data. Actuaries take a huge amount of claims history data, then look at attributes like age, zip code, gender etc. and assess the relationship between these characteristics and the number of claims that get filed.
An insurance premium is basically a representation of your perceived risk to the insurance company based on the attributes they analyzed.
So with machine learning, an algorithm takes over the role of assessing the relationship between these characteristics and claim history. Very basically, you’re allowing the algorithm to decide how much a person should pay.
COVER: So how does this pose a problem?
AD: The fundamental problem that you run into is if you are training from datasets and the dataset you are starting with has some sort of implicit bias in it.
Whenever bias already exists in the dataset — so that if someone has a lower credit score or they are single — that’s already reflected in their accident history or claim history
When you use that data to train machine learning algorithm that algorithm will have a similar bias.
COVER: Does this mean the introduction of machine learning to insurance actually risks making things worse?
AD: It really depends on how it’s ultimately being implemented. If it’s being implemented through behavior-based systems using telematics, it’s probably making it better.
It also depends on how you optimize the system — whether you’re optimizing it for the lowest premium, the lowest claim, etc. But If it basically takes the existing claims history and runs it through a machine learning algorithm it could potentially make it worse.
COVER: So if the key is implementation, how do you safeguard against this happening?
AD: What you can do is make adjustments to the data that you put in. Let’s say you get everyone’s ethnicity. If you plug that in you’ll get a weighting based on race.
The easiest way to avoid that is to take race away from the input characteristic so that way the algorithm is blind to that.
COVER: How do you spot the biases in the first place?
AD: There’s existing libraries or third-party solutions that check for fairness in your existing algorithm.
There are a few things you can do in house, too. Let’s say you want something to not be biased on gender, for example, you can take set of input characteristics, pull out gender, train the system, and then put gender back in for a test subset then see if there’s an variation between male and female.
As you’re designing the system, you should be taking notes where it could potentially be biased. Once it’s complete, you can test it to see if those biases exist and if you need to make modifications.
However, there’s a second order issue here. If, for example, race is correlated to some other characteristic in the dataset — so say race is correlated to zip code — how do you un-bias that?
COVER: Then is removing biases from insurance a realistic goal for insurtechs?
AD: It’s not possible. It’s more about which characteristics you’re ok with being biased on and what characteristics you want to go out of your way to be fair on. In everything there’s going to be some level of bias, so you’re making a tradeoff.
The more data points that you have on a person and their vehicle information, the more personalized the quote is. The more characteristics that you remove, the more general the quote is.
Then, the issue you run into here is that bad drivers are being subsidized by good drivers.
COVER: Would these algorithms be any more transparent than the system we have now?
AD: There are different types of algorithms that you can use. Some are very much a ‘black box’ and you can’t really explain them.
There are other types that have more explainability built in. They are closer to something like a multivariate regression. If you want to explain to someone non-technical, you’d use those more easily explainable algorithms that actually show the output of the model.
It basically becomes using machine learning to create the rating tables as opposed to creating a black box.
COVER: So how else can engineers build fairer ways of pricing insurance?
AD: The way things are shifting is towards using telematics, so actually tracking physical driving as opposed to rating on characteristics. Over time that will, in theory, be more reflective of how risky a driver someone is.
Then you can start taking into account things like the hours they drive. For example, if it’s at night, it’s harder to see so they’re probably a slightly higher risk. You can look at whether they tend to accelerate or break quickly, whether they take long trips or short trips, where they are driving, where are they parking their car, etc.
With those data-rich systems the pricing is more customizable to that individual person and less general based on what category they fall into.
The tradeoff is that when switching from the current characteristic-based to more behavior-based pricing, some people’s rates will go down significantly and they’ll benefit. For other people, their rates will go up because they are actually bad drivers.
At the end of the day, the key thing is building fairness from the ground up.
As soon as we start on the the machine learning path, we want to build the whole process to be as fair as reasonable.
That means at the beginning defining what the input characteristics are. What are the factors that we do want to affect the rate?
It’s building it into the process, having an eye — even before you start — for the characteristics you want the algorithm to have, then testing against these characteristics throughout the entire process.
With normal machine learning algorithm you have a dataset that you use to train your machine learning model and then get an output. What you can do at the beginning is design a dataset where you know that your algorithm is biased if you get certain results.
That way at each step during the process you can test your new algorithm against a detector dataset. Basically you’re using the technology that underlies the algorithm to also test the algorithm.