In data science, the insurance industry reaches for the sky

American Family Insurance
AmFam
Published in
5 min readDec 6, 2019

By Glenn Fung, American Family Insurance Data Science Research Director

The property and casualty insurance industry has a long reputation of being traditional … even old-fashioned. An insurance company likely isn’t the first place that comes to mind when a data scientist thinks about where they might dig their teeth into interesting, innovative and challenging work.

My experience at American Family Insurance challenges that myth.

Insurance companies have a tremendous amount of data. Customer data. Claim-loss data. Actuarial data. Even data about the weather. The list goes on and on.

To be useful, that data needs to be organized so it can be used in a wide variety of ways: to predict loss experience, for policy rating, to help customers prevent claims, to create new products — in short, to provide top-quality products and services customers expect.

At American Family Insurance, effective and ethical use of all this data requires using innovative — even cutting-edge — solutions that might surprise you. Here’s a closer look at just one.

Matching up customer records

We are always looking for new and better ways to use data to serve customers and grow our business. Recently, as American Family took a closer look at our ability to efficiently cross-sell and up-sell products and services to existing customers, and we realized that a creative approach fueled by machine learning would likely help us.

As a first step, we needed to identify customers with multiple policies across different product lines. We quickly realized that was not an easy task, especially because, for legacy reasons, we have millions of customer records across multiple databases that store the customer information for the different product lines.

Our challenge was to develop a system that could be easily used by people who don’t know computer programming, machine learning or entity (entities in this case being customers) matching — but they do understand what it means to match — and therefore can label data pairs as a match or no match.

And we did it. Our talented data science and data engineering teams, in partnership with the University of Wisconsin-Madison, created a self-service, entity matching system we now call CloudMatcher, which is used across our company.

How CloudMatcher works

CloudMatcher uses machine learning to take input from a person to teach the system how to match records. Before creating it, our teams, Ming Sun, director of data engineering, and I hypothesized that machine-learning technology could enhance entity matching on a large scale. That hypothesis was confirmed by AhHai Doan, a professor in the Computer Sciences Department at UW-Madison, and his Ph.D. student Yash Govind.

As an aside, UW-Madison is an invaluable partner to us when it comes to computer science and data analytics. Having them in our hometown further enhances American Family’s opportunities, culture and data science capabilities.

When a user logs into CloudMatcher and uploads their datasets/databases, the system asks if a data pair (for example, John Smith and John Smyth) are the same record. The system records their answer and repeats this process multiple times, a process known as active learning.

With active learning, the user starts with a few data pairs, teaches the system whether they are a match or not, and then moves on to the next record. After doing this multiple times, the system starts learning to identify matching data pairs without human input.

Here’s a quick graphical look at the system and its benefits.

CloudMatcher was tested by matching records from one database — which has about 5 million auto or home insurance customers — with our commercial lines’ insurance primary named insured database with 110,000 records.

Using a traditional, rules-based system, matching would have required at least 500,000 million labels.

With this test, it took one person 50 minutes to label 780 matches. The accuracy of those matches was calculated at 99.5 percent.

Using cloud-computing resources, the system matched all the rest of the records in five hours, at a computing cost of just $14. It successfully matched 95 percent of all potential matches.

Most matching applications only take a sample of data. With an early version of CloudMatcher, we did it with one of the largest datasets we have — approximately 30 million historical records from our customer database — with 900 million historical records in a consumer dataset we purchased from a vendor.

And as we go forward and move more of our data science projects to the cloud, we believe CloudMatcher will be invaluable. You might say the sky is the limit. (Ok, I know that’s a bad joke).

This is just one of many innovative data science projects we have underway at American Family Insurance. Take a look at the others we are involved in or have tackled in the past.

About the author

Glenn Fung is a data science and artificial intelligence expert at American Family Insurance.

For 12 years he has worked in the industry — including at Siemens, Amazon and American Family Insurance — developing and applying novel machine-learning techniques to solve challenging industry-related problems.

About Data Science at American Family

American Family Insurance has invested $20 million in UW-Madison data science initiatives — including $10 million in research over the next 10 years — and established a $10 million endowment to create the American Family Insurance Data Science Institute on its Madison, Wisconsin campus. We actively recruit and hire for data science roles across our enterprise.

--

--

American Family Insurance
AmFam
Editor for

American Family Insurance is committed to inspire and help you discover, pursue and protect your dreams. #DreamFearlessly