Searching user claims in auto insurance with Augmented Data Explorer

Ahsan Rehman
IBM Data Science in Practice
7 min readMar 4, 2020

Overview

Analysts, software testers and business users often need to find data based on specific conditions and fields that are spread across different tables. Accomplishing these tasks requires a few repetitive steps:

  • Joining tables and understanding the primary and foreign key relationships
  • Conditional criteria on columns for filtering specific details
  • Aggregating a specific field with a grouping condition

Getting this data requires an understanding of tables, the names of columns, and the relationships between tables. Accessing data from any database needs an understanding of complex SQL queries, which can take time to learn and write.

How Augmented Data Explorer helps

With IBM Db2 Augmented Data Explorer users can search fields using natural language, without having to learn the names of tables or how to join them. Furthermore, it has an explorer feature providing an interactive space to discover field relationships, along with their data types and visualizations.

For the table relationships, Augmented Data Explorer infers joins automatically and builds a directed graph of the relationships among the tables. It uses pre-defined relations if they’re available. It uses column names, data types, and values in order to infer relationships between tables.

Data Description

The data used for this example is related to the Auto Insurance industry. It was collected from multiple online sources (GitHub, Kaggle) and contains auto claims data from different states in the U.S. These datasets were then processed to create four tables:

  • Claims: Details of the auto claims
  • Customers: Customer information who bought the auto insurance policy
  • Policy: Details of the policy offered by the insurance company
  • Policy Assets: Details of the car covered under the policy

Descriptive statistics for the dataset

Getting started

The current version 0.11.0 (build 20196) of Augmented Data Explorer supports connections to IBM Db2. Here, we create a schema using IBM Db2 Warehouse on Cloud and upload the tables.

Figure 1: Load Data on IBM Db2 Warehouse on Cloud

The next step is creating a connection to Db2 in Augmented Data Explorer and selecting the tables to crawl. When crawling the selected datasets, it analyzes a sample of the data to build metadata, which it then uses when making real-time query recommendations.

The Magic of Augmented Data Explorer (Search using text and no SQL)

1. Explore the dataset

In the first step, we can see the four tables, around 50 fields, and their data types and how they are connected. We can see Claims connected to Customers via Policy and Policy Assets tables.

Figure 2: Augmented Data Explorer has an interactive view of the tables and columns, along with the joins between these tables.

2. Analyzing a potential target

During the data exploration, most of the times we are looking for specific targets. Our analysis moves around these targets and any insights and relationships about them can be leveraged. In this dataset we are focused on auto insurance. Two potential targets are Premium and Claims from this data. Let’s start with the Premium field by selecting it in the explorer tab.

Figure 3: With Augmented Data Explorer users select fields, producing a bar chart showcasing the comparison among categories/bins.

3. Univariate analysis for Target field

Once we have our potential target field, the next step is to run univariate analysis. Augmented Data Explorer generates a specific visualization for the field, along with a natural language generated (NLG) text. NLG quickly tells us the maximum, minimum value, standard deviation values, and the outlier values in this field. Here we see Premium field has 18 outliers and a mean of 1256.41.

Figure 4: Augmented Data Explorer provides Visualization & Natural Language Insights

4. Bivariate Analysis (Geographical Visualization)

An exciting part of data analysis is the ability to explore and visualize different data fields. During our analysis of auto insurance, we search for avg prem by st to see if there any trends by state. Augmented Data Explorer autocompletes the search query and its first recommendation Average of Premium by State is closest to what we are looking. This is ideal for situations where the user doesn’t know the full details of a column or table. Augmented Data Explorer automatically handles these details.

The second interesting part is Augmented Data Explorer recognizing State as a geographical field and creating a map to help users better understand the data. The NLG not only provides descriptive stats but also describes the results of bivariate analysis. In this example, the NLG points out that “Kansas” and “Missouri” have higher average premiums, and that the fields State and Average Premium don’t have meaningful relationships.

Figure 4: Augmented Data Explorer recognizes states and produces geographical visualization

5. Forecasting for time series data

We would like to find trends in claims over time specifically by month, although there is no month field in the dataset. When we search for avg claim by month, Augmented Data Explorer automatically finds the month feature from the Incident Date field and shows a visualization for time series data. It also gives an option to automatically calculate a forecast. Choosing this option gives the next 6 months’ forecast for average claims.

Figure 6: Augmented Data Explorer picks time-series trends and gives forecasting options
Figure 7: Forecast values of claims for the next 6 months

6. Analysis by demographics field

During the analysis of our target field Claim, we would like to run analysis by demographics. When we search avg claim by demographics, Augmented Data Explorer automatically shows features like Education, Gender, and Age for bivariate analysis by Claims. Furthermore, we see with Augmented Data Explorer ’s autocompletes feature, different types of Claims with demographic features.

Figure 8: Augmented Data Explorer supports keywords like “Demographics” and matches all relevant fields

7. Auto-Joins

There are times in the analysis when the required fields are spread across different tables. This situation requires understanding complex joins between these tables, but with Augmented Data Explorer everything is handled at the backend. Therefore, when we search avg pre by occu and educ, Augmented Data Explorer autocompletes the query and joins the columns from the Policy and Education fields. Augmented Data Explorer also gives natural language insights about these two categorical fields with the target Premium field.

Figure 9: Augmented Data Explorer auto-join feature

8. Table joins with condition

When it comes to user analysis, there are a lot of times when this analysis would involve conditional operators. In such situations Augmented Data Explorer not only supports multiple operators but also interprets them automatically. For example, if a user searches occup where prem > 1000, Augmented Data Explorer understands that Occupation and Premium are from two different tables and recognizes the “greater than” operator.

In the NLG insights, we see interesting results like “Doctor is smaller than expected. Blue Collar is especially large”. This can be particularly useful when you have a lot of categories, as there could be a chance the insights might be overlooked by the users.

Figure 10: Augmented Data Explorer joins with conditional values

9. Joins between multiple tables

So far, we have seen joining between two tables, but what if there are more than two tables and we want to add conditions with these joins. A search example might beclaim by gender where make with conditions like Model: Camaro and Age< 30. Augmented Data Explorer will join Claims, Customers and Policy Assets.

Another feature in Augmented Data Explorer is using the pills in search bar to select values from the field e.g. selecting Camaro in the Model field and values less than 30 from Age — thus making it easier to add new fields and also to select multiple values in a field.

Figure 11: Join between three tables
Figure 12: Augmented Data Explorer Pill feature for adding conditions
Figure 13 Augmented Data Explorer enables joins and conditional operators

Conclusion

In this data exploration process, we quickly got familiarized with different fields about the auto insurance dataset, potential target fields claim and premium, and their relationship with other fields. During this process, Augmented Data Explorer automatically connects the tables, interprets user searches by auto-completing fields, creates specific visualizations and summarizes useful details in natural language.

Another important feature of search is using pills to filter rows with conditional operators and aggregating these fields. Very quickly, we were able to identify the claim ID for particular policy assets and premium conditions.

IBM Db2 Augmented Data Explorer is now in beta. We invite you to try it out, and share your feedback with us!

--

--

Ahsan Rehman
IBM Data Science in Practice

Data Scientist @ibm | Co-founder @nustianusa | Mentor/Coach for DS/Analytics