Effective Field-Testing Strategies for AI Models — Part I

5 min readAug 25, 2021

Contents

What is Field Testing
Where does it fit in the life cycle of AI project
Process of Field testing
Stability Testing & A/B Testing
Applications of Field Testing
Advantages and Challenges

What is field testing?

Field testing is an experimentation process to test the effectiveness of AI Intervention with traditional process at the same time and measuring the success of AI intervention in quantifiable outcome.

2. Where does it fit in the life cycle of AI project?

There are multiple stages in operationalizing an AI product. The first stage is the demand management stage. The business teams and the data science teams work together in defining a goal to establish a proof of concept with the AI solution. Here, the success metric could be to evaluate the formulation of business problems in AI terms and/or efficacy.

In the next step, proof of value needs to be established. This would enable a buy-in from the business teams if the solution is viable. To evaluate whether the AI model would work on a myriad of data points, testing of the AI model would be required. One such method is called as field testing.

Figure 1: Life cycle of a typical Data science & AI project

We use field testing in between Proof of Value and Pre-Industrialization to assess the value potential of AI intervention in the business process

However, this field testing is continued even after Industrialization as well to check the efficiency in newer versions of the AI model, and it helps in monitoring the drift in the process.

3. Process of Field testing

There are various steps involved in field testing based on the business process

Stability Testing of AI model
A/B Testing

4.1 Stability Testing

Testing of AI model is important to understand the performance across multiple datasets to establish the consistency and efficacy of the AI model. The stability testing of the AI model helps understand the issues with the data, changes in the business processes over time (if any), data drift across different variables used for testing and check the quality of predictions by the AI model. Stability testing provides an overall assessment of the AI model.

Once the stability testing is completed, we generally freeze the best version of AI model for further testing in the process. One such process is called A/B testing.

As part of stability testing AI model needs to be evaluated in multiple iterations, one such example is explained in the following diagram.

Multiple iteration can be performed by rolling the train & validation window with n number of months and the subsequent k number of months to test the AI model. So, this approach works when the data has time component present and helps in identifying the presence of drift in both target and data.

Similarly based on your business problem and data decide on the best stability testing strategy.

4.2 A/B Testing

The process of A/B testing involves running the trained AI model on two different scenarios. In our opinion, field testing might give better results if we perform on real-time data.

When we generally feed the data to AI model, The outcome should split into 2 arms, which are Test and Control Arms. Each arm contains half of the dataset. “The main goal of A/B testing is to check if there is a significant lift in business metric When AI intervention is made”

The dataset in control arm and test arm must have similar characteristics for effective evaluation of the AI model and it eliminates the bias. The control arm proceeds as usual without any AI intervention. After the field-testing period, the output of control arm would be recorded. During the same period, the AI interventions would be made on the test arm based on the AI outcome. The output and intervention actions of the test arm would also be recorded after the field-testing period. The difference in business metric for both control and test arm need to be calculated. The lift/dip in the business metric can be used to assess the whole process.

One iteration would not ideally be sufficient to evaluate the outcome of A/B testing. Hence multiple iterations are required to establish the efficacy of A/B testing.

There are many methods to assess the A/B testing outcome. One method is using the confidence interval, where the statistical significance between outputs is evaluated at an agreed confidence interval (90%, 95% or 99%).

Figure 4: Control Group Vs Test Group evaluation template

5. Applications of Field Testing in AI projects

Field testing can be conducted in almost all AI projects. Some examples of field testing are mentioned below

Evaluating predictions of a payment made to the business client for a defined period
Exposing a user interface (UI) after changing additional features to the UI
Effect of campaigns after building recommendation engine
Evaluating the efficacy of newly launched medicines in the market

6. Advantages and Challenges of Field Testing

Advantages

Stability assessment of AI model
It helps in evaluating the necessity of the AI model in the business process
It increases confidence of business stakeholders in deploying AI model
Reducing losses by preventing additional project costs if business evaluation is not met

Challenges

Additional efforts needed for field testing by data science team and business team
Test and control arm should have similar data characteristics to avoid bias in the process
Not acting on AI predictions on a timely basis
Stringent timelines to evaluate the field-testing outcome

In the next blog post, we will mention about how field testing can be done using python.

About authors

Srihari Nalabolu is a data scientist with experience in multiple domains across healthcare & pharmaceutical, banking and telecom industries. He is interested in Anomaly Detection, Recommender Systems, Text Mining and Computer Vision.

Adithya N is a data scientist currently working in the healthcare & pharmaceutical industry. He is interested in Economics, Machine Learning and AI product management.

Disclaimer: This article is written based on the experiences of authors working on different types of data science & AI problems across multiple domains

Effective Field-Testing Strategies for AI Models — Part I

Written by Srihari Nalabolu