The Journey of Data

Socure
The Socure Technology Blog

--

By Agnel J D’Cruz, Manager — Data Strategy and Acquisition

Introduction

There are two types of organizations — those that claim to be data-driven and those that are. At Socure, we interact with data from a huge array of sources, but what differentiates us from the rest is how we go about our data journey.

The problem

The Data Journey begins with a problem statement — in other words, “a problem looking for data.” The source may be a customer who has a specific use case or a product manager with an idea, or even external sources like new legislation.

Where can I get data?

This is generally the longest and most time consuming part of the process where conversations are necessary to identify what we’re looking for, and whether there exists direct data or derivative data sources that could be used as a proxy for the problem we need to solve. Once we’ve identified a source, we need to explore coverage, recency, frequency and data attributes. Legal and compliance restrictions with usage are also figured into this step.

Test and prototype

“In God we trust; all others bring data.”

This quote, from American statistician and business consultant, W. Edwards Deming, poignantly explains the value of high quality data. We need to ask the data questions about itself.

We test the data to ensure it meets our standards and answers the questions we started out with. For example, if we’re looking to cover gaps in a certain demographic, does the data perform as expected with its coverage? We call it the champion-challenger approach where we run a new source (challenger) against an existing source (champion) to see which data is most useful. The data science teams work exhaustively in phase to provide their recommendations. It is important to note that when it comes to data — there may not be ONE winner, on occasions a combination of data sources might be the right approach.

Another important factor we keep in mind is parity. Especially if we’re replacing a data source, do we have the same if not better performance from the new acquired data source?

The journey to production

Lots of work is needed to integrate a data source with a solution. This could mean building and testing models, testing system performance, testing historical scenarios and requests, tests for compatibility, checks against SLAs, and so on.

Once something is in production, most organizations celebrate and move on to the next project — but here at Socure we are just getting started.

Product teams track customer usage after a release, and look for anomalies in addition to other key performance metrics. We have to validate that data is performing to expectations, and that there aren’t unintended consequences. Oftentimes we uncover insights that act as feedback for our product team, who are always looking for opportunities to further refine our offerings.

Diamond in the rough?

There are instances where data is collected and teams are not sure what to make of it, or key insights are hidden. Here, a strong Exploratory Data Analysis (EDA) culture comes into play — organizations should have strong practices and must also set aside time to review data that to most “busy” people (aka everyone in every organization today) is of no use. Data Scientists or even analysts must review this data to see if there are gems hidden. Some questions that could be asked are:

  • Is there a pattern around usage around time of day/week/month/year
  • Is there a type of request that is fired more frequently than others
  • What are upsell and cross sell opportunities by customer segment i.e. what services are similar customers not using
  • Are there patterns around errors messages

Simply taking available data and looking at “What Happened” can lead to insightful discoveries which could translate into huge business opportunities.

Conclusion

Successful organizations are intentional with their data practices. Organizations should monitor changes and make improvements on a regular basis. Treating data as a one-off project that closes once a report or product increment is delivered will leave organizations behind in the quest to be data driven and make unbiased business decisions.

Agnel J D’Cruz is a Data Acquisition Manager at Socure and helps drive data strategy for our KYC product. Agnel comes with a wide range of experiences from working in consulting to business analysis to analytics solution delivery. His writing includes an HBR article, a co-authored book, in addition to blogs on LinkedIn. He also hosts his own podcast.

--

--

Socure
The Socure Technology Blog

The leading provider of digital identity verification and fraud solutions.