Effective Field-Testing Strategies for AI Models — Part I

Srihari Nalabolu
5 min readAug 25, 2021

--

Contents

  1. What is Field Testing
  2. Where does it fit in the life cycle of AI project
  3. Process of Field testing
  4. Stability Testing & A/B Testing
  5. Applications of Field Testing
  6. Advantages and Challenges
  1. What is field testing?

Field testing is an experimentation process to test the effectiveness of AI Intervention with traditional process at the same time and measuring the success of AI intervention in quantifiable outcome.

2. Where does it fit in the life cycle of AI project?

There are multiple stages in operationalizing an AI product. The first stage is the demand management stage. The business teams and the data science teams work together in defining a goal to establish a proof of concept with the AI solution. Here, the success metric could be to evaluate the formulation of business problems in AI terms and/or efficacy.

In the next step, proof of value needs to be established. This would enable a buy-in from the business teams if the solution is viable. To evaluate whether the AI model would work on a myriad of data points, testing of the AI model would be required. One such method is called as field testing.

Figure 1: Life cycle of a typical Data science & AI project

We use field testing in between Proof of Value and Pre-Industrialization to assess the value potential of AI intervention in the business process

However, this field testing is continued even after Industrialization as well to check the efficiency in newer versions of the AI model, and it helps in monitoring the drift in the process.

3. Process of Field testing

There are various steps involved in field testing based on the business process

  1. Stability Testing of AI model
  2. A/B Testing

4.1 Stability Testing

Testing of AI model is important to understand the performance across multiple datasets to establish the consistency and efficacy of the AI model. The stability testing of the AI model helps understand the issues with the data, changes in the business processes over time (if any), data drift across different variables used for testing and check the quality of predictions by the AI model. Stability testing provides an overall assessment of the AI model.

Once the stability testing is completed, we generally freeze the best version of AI model for further testing in the process. One such process is called A/B testing.

As part of stability testing AI model needs to be evaluated in multiple iterations, one such example is explained in the following diagram.

Figure 2: Template of Stability Testing

Multiple iteration can be performed by rolling the train & validation window with n number of months and the subsequent k number of months to test the AI model. So, this approach works when the data has time component present and helps in identifying the presence of drift in both target and data.

Similarly based on your business problem and data decide on the best stability testing strategy.

4.2 A/B Testing

The process of A/B testing involves running the trained AI model on two different scenarios. In our opinion, field testing might give better results if we perform on real-time data.

When we generally feed the data to AI model, The outcome should split into 2 arms, which are Test and Control Arms. Each arm contains half of the dataset. “The main goal of A/B testing is to check if there is a significant lift in business metric When AI intervention is made”

The dataset in control arm and test arm must have similar characteristics for effective evaluation of the AI model and it eliminates the bias. The control arm proceeds as usual without any AI intervention. After the field-testing period, the output of control arm would be recorded. During the same period, the AI interventions would be made on the test arm based on the AI outcome. The output and intervention actions of the test arm would also be recorded after the field-testing period. The difference in business metric for both control and test arm need to be calculated. The lift/dip in the business metric can be used to assess the whole process.

Figure 3: Flow chart of A/B Testing

One iteration would not ideally be sufficient to evaluate the outcome of A/B testing. Hence multiple iterations are required to establish the efficacy of A/B testing.

There are many methods to assess the A/B testing outcome. One method is using the confidence interval, where the statistical significance between outputs is evaluated at an agreed confidence interval (90%, 95% or 99%).

Figure 4: Control Group Vs Test Group evaluation template

5. Applications of Field Testing in AI projects

Field testing can be conducted in almost all AI projects. Some examples of field testing are mentioned below

  • Evaluating predictions of a payment made to the business client for a defined period
  • Exposing a user interface (UI) after changing additional features to the UI
  • Effect of campaigns after building recommendation engine
  • Evaluating the efficacy of newly launched medicines in the market

6. Advantages and Challenges of Field Testing

Advantages

  • Stability assessment of AI model
  • It helps in evaluating the necessity of the AI model in the business process
  • It increases confidence of business stakeholders in deploying AI model
  • Reducing losses by preventing additional project costs if business evaluation is not met

Challenges

  • Additional efforts needed for field testing by data science team and business team
  • Test and control arm should have similar data characteristics to avoid bias in the process
  • Not acting on AI predictions on a timely basis
  • Stringent timelines to evaluate the field-testing outcome

In the next blog post, we will mention about how field testing can be done using python.

About authors

Srihari Nalabolu is a data scientist with experience in multiple domains across healthcare & pharmaceutical, banking and telecom industries. He is interested in Anomaly Detection, Recommender Systems, Text Mining and Computer Vision.

Adithya N is a data scientist currently working in the healthcare & pharmaceutical industry. He is interested in Economics, Machine Learning and AI product management.

Disclaimer: This article is written based on the experiences of authors working on different types of data science & AI problems across multiple domains

--

--

Srihari Nalabolu

Data Scientist | AI Mentor | Passion for Coding, NLP, Anomaly Detection, Recommender Systems & Computer Vision | Interested in Playing Chess and Volleyball