Picture credit: PixaBay

Enrich your pre-existing data and build lighter and compact models!

A guide to evaluating vendors within data science organizations

Kirti Pande
5 min readAug 23, 2022

--

Leveraging Data Science and data intelligence has made its way into almost every business domain bolstering decision making. However, every company has a limitation on the kind of data it may capture as well as the infrastructure to support complex data models. This blog post talks about how companies may utilize data intelligence provided by external vendors to enrich their pre-existing data collection as well as build lighter and compact models. This article also introduces the process of evaluating external vendors before onboarding them.

What are the benefits of vendor data?

A product or business owner such as an e-Commerce merchant can access internal data, related to activities within their ecosystem alone. However, having access to customer behavior and information outside this ecosystem can provide a different perspective. This additional information in conjunction with the existing data collection normally result in improved feature intelligence.

The addition of external vendor data is particularly beneficial to aid decision on thin files or new customers on the product

This is because external data vendors normally establish consortiums of data collected from omni channels such as banks, e-commerce, social media, etc. Nevertheless, the type of data supply varies amongst different vendors. Some vendors simply provide raw data, and many others carry out feature engineering and/or provide scores based on their internal machine learning models. Usually vendors with sophisticated business models, require the merchant to share its data to enrich the former’s consortium as well, when utilizing their services.

How can a merchant choose amongst various external data vendors in the market?

The author has worked on vendor evaluation for E-Commerce payment risk management in the past few years and drove the process from end to end, including vendor research, proof of concept, evaluation, comparison, and onboarding. Here are the highlights of the process:

Problem assessment:

Step 1: Before reaching out to a vendor, problem assessment and a good understanding of the data needs and business goals should be established. This helps a merchant narrow down the search scope.

Step 2: Once the goals and the data needs are understood, an initial assessment of the vendors based on criteria around a merchant’s policies, internal onboarding process requirements, budget, etc. should be conducted. The business should participate in this assessment process. Some key points to keep in mind when screening these vendors:

a. Understanding the customer base of the vendors and whether they match the merchant’s customer

b. The age and status of the vendor and the product offered

c. The vendor reputation or presence of any legal action against them

d. Sometimes, the vendor’s product depends on technology providers which compete with the merchant, communications regarding the merchant’s concerns should be taken place before the POC stage

Step 3: Before starting the next steps, the cost of the Proof-of-Concept stage and Evaluation should be finalized and approved by business. Also, it should go through any privacy and compliance assessment especially if sensitive information is involved.

Proof of Concept:

Steps 4: Next, kicking off the Proof-of-Concept process. The data science or analytics team prepares an encrypted dataset according to the merchant’s policies and shares it with all vendors under evaluation. It is worth noting that the data set should include an identifier for every entry to link it back to the original or other records. Besides, the data set should include a timestamp (when the event occurred) for every entry. A merchant should never include confidential information such as performance in the data set.

Step 5: The vendor is supposed to return point-in-time data subject to (on or before) the timestamp. This is because the purpose of Proof of Concept is to evaluate the vendor data provided at the point of time when the particular transaction or data entry took place to simulate real time response from the vendor.

Evaluation:

Once the vendor appends the data set with its own data intelligence and returns to the merchant, what are the next steps?

Step 6: Generally speaking, there are three major approaches to evaluate the vendor’s data:

i. Check for a lift in the performance or essential KPIs of the existing models/solutions while consuming the vendor’s data directly

ii. Ensemble the existing models/solutions with feature engineering on the vendor’s data and check for any lift in the performance or essential KPIs. This improvement in performance can be measured by assessing lift/gains charts as well as improvement in the precision and recall of the solutions:

Note that due to the possible high correlation between the variables which generate the existing models and the new vendor data, it is not unusual that a merchant may observe a declined performance while comparing to the baseline performance

iii. The data coverage or response rate is also a critical evaluation metrics to consider. It is always appealing to choose the vendor whose data has a high coverage rate as opposed to others.

iv. The Return-on-Investment analysis/calculation should be carried out to assess if the benefits or lift in the performance is worth the investment being made on onboarding the vendor

Step 7: Discussions with the engineering team about the additional SLA due to the time taken by the vendor to provide the response in real time should be conducted. After all, a merchant does not expect the vendor integration in real time to adversely impact customer experience.

Step 8: The Tech support offered by the vendor post the implementation should be discussed before finalizing the vendor

Implementation:

Step 9: When a vendor has been finalized it is always beneficial to incorporate the vendor data in stage environment first for thorough end to end testing before going into production. All involved stakeholders should be a part of this process to assess no errors on their fronts.

Step 10: Once the vendor integration completes and the data goes live, a merchant should continue monitoring and setup alerting calls in case of any drop in the response from the vendor due to technical issues or outages.

Lastly, it is a good practice to continuously evaluate the value of the vendor data every couple of months. Touching base with the vendor involved regularly and understanding new observations in customer behavior from their perspective is a beneficial practice.

A well-functioning data enrichment process is fundamental to the providing successful customer experience for a product in this data centric world. In this data driven day and age there are numerous vendors providing access to variety of data like geographical, behavioral, financial, etc. but it is designing the right data enrichment strategy and choosing the appropriate vendor which is integral in making this process truly value-adding to one’s business needs and objectives.

--

--