Searching user claims in auto insurance with Augmented Data Explorer
Overview
Analysts, software testers and business users often need to find data based on specific conditions and fields that are spread across different tables. Accomplishing these tasks requires a few repetitive steps:
- Joining tables and understanding the primary and foreign key relationships
- Conditional criteria on columns for filtering specific details
- Aggregating a specific field with a grouping condition
Getting this data requires an understanding of tables, the names of columns, and the relationships between tables. Accessing data from any database needs an understanding of complex SQL queries, which can take time to learn and write.
How Augmented Data Explorer helps
With IBM Db2 Augmented Data Explorer users can search fields using natural language, without having to learn the names of tables or how to join them. Furthermore, it has an explorer feature providing an interactive space to discover field relationships, along with their data types and visualizations.
For the table relationships, Augmented Data Explorer infers joins automatically and builds a directed graph of the relationships among the tables. It uses pre-defined relations if they’re available. It uses column names, data types, and values in order to infer relationships between tables.
Data Description
The data used for this example is related to the Auto Insurance industry. It was collected from multiple online sources (GitHub, Kaggle) and contains auto claims data from different states in the U.S. These datasets were then processed to create four tables:
- Claims: Details of the auto claims
- Customers: Customer information who bought the auto insurance policy
- Policy: Details of the policy offered by the insurance company
- Policy Assets: Details of the car covered under the policy
Descriptive statistics for the dataset
Getting started
The current version 0.11.0 (build 20196) of Augmented Data Explorer supports connections to IBM Db2. Here, we create a schema using IBM Db2 Warehouse on Cloud and upload the tables.
The next step is creating a connection to Db2 in Augmented Data Explorer and selecting the tables to crawl. When crawling the selected datasets, it analyzes a sample of the data to build metadata, which it then uses when making real-time query recommendations.
The Magic of Augmented Data Explorer (Search using text and no SQL)
1. Explore the dataset
In the first step, we can see the four tables, around 50 fields, and their data types and how they are connected. We can see Claims connected to Customers via Policy and Policy Assets tables.
2. Analyzing a potential target
During the data exploration, most of the times we are looking for specific targets. Our analysis moves around these targets and any insights and relationships about them can be leveraged. In this dataset we are focused on auto insurance. Two potential targets are Premium and Claims from this data. Let’s start with the Premium field by selecting it in the explorer tab.
3. Univariate analysis for Target field
Once we have our potential target field, the next step is to run univariate analysis. Augmented Data Explorer generates a specific visualization for the field, along with a natural language generated (NLG) text. NLG quickly tells us the maximum, minimum value, standard deviation values, and the outlier values in this field. Here we see Premium field has 18 outliers and a mean of 1256.41.
4. Bivariate Analysis (Geographical Visualization)
An exciting part of data analysis is the ability to explore and visualize different data fields. During our analysis of auto insurance, we search for avg prem by st
to see if there any trends by state. Augmented Data Explorer autocompletes the search query and its first recommendation Average of Premium by State is closest to what we are looking. This is ideal for situations where the user doesn’t know the full details of a column or table. Augmented Data Explorer automatically handles these details.
The second interesting part is Augmented Data Explorer recognizing State as a geographical field and creating a map to help users better understand the data. The NLG not only provides descriptive stats but also describes the results of bivariate analysis. In this example, the NLG points out that “Kansas” and “Missouri” have higher average premiums, and that the fields State and Average Premium don’t have meaningful relationships.
5. Forecasting for time series data
We would like to find trends in claims over time specifically by month, although there is no month field in the dataset. When we search for avg claim by month
, Augmented Data Explorer automatically finds the month feature from the Incident Date field and shows a visualization for time series data. It also gives an option to automatically calculate a forecast. Choosing this option gives the next 6 months’ forecast for average claims.
6. Analysis by demographics field
During the analysis of our target field Claim, we would like to run analysis by demographics. When we search avg claim by demographics
, Augmented Data Explorer automatically shows features like Education, Gender, and Age for bivariate analysis by Claims. Furthermore, we see with Augmented Data Explorer ’s autocompletes feature, different types of Claims with demographic features.
7. Auto-Joins
There are times in the analysis when the required fields are spread across different tables. This situation requires understanding complex joins between these tables, but with Augmented Data Explorer everything is handled at the backend. Therefore, when we search avg pre by occu and educ
, Augmented Data Explorer autocompletes the query and joins the columns from the Policy and Education fields. Augmented Data Explorer also gives natural language insights about these two categorical fields with the target Premium field.
8. Table joins with condition
When it comes to user analysis, there are a lot of times when this analysis would involve conditional operators. In such situations Augmented Data Explorer not only supports multiple operators but also interprets them automatically. For example, if a user searches occup where prem > 1000
, Augmented Data Explorer understands that Occupation and Premium are from two different tables and recognizes the “greater than” operator.
In the NLG insights, we see interesting results like “Doctor is smaller than expected. Blue Collar is especially large”. This can be particularly useful when you have a lot of categories, as there could be a chance the insights might be overlooked by the users.
9. Joins between multiple tables
So far, we have seen joining between two tables, but what if there are more than two tables and we want to add conditions with these joins. A search example might beclaim by gender where make
with conditions like Model: Camaro and Age< 30. Augmented Data Explorer will join Claims, Customers and Policy Assets.
Another feature in Augmented Data Explorer is using the pills in search bar to select values from the field e.g. selecting Camaro in the Model field and values less than 30 from Age — thus making it easier to add new fields and also to select multiple values in a field.
Conclusion
In this data exploration process, we quickly got familiarized with different fields about the auto insurance dataset, potential target fields claim and premium, and their relationship with other fields. During this process, Augmented Data Explorer automatically connects the tables, interprets user searches by auto-completing fields, creates specific visualizations and summarizes useful details in natural language.
Another important feature of search is using pills to filter rows with conditional operators and aggregating these fields. Very quickly, we were able to identify the claim ID for particular policy assets and premium conditions.
IBM Db2 Augmented Data Explorer is now in beta. We invite you to try it out, and share your feedback with us!