Fast vs. Slow: Which Lane Should You Pick for Delivering AI to Your Business Users?

Published in

Analytics Vidhya

4 min readJul 15, 2021

In 2017, I was leading a data science project at IBM. We wanted to build a machine learning (ML) model that would help identify a short-list of customers at higher risk of non-renewing their product licenses. Based on this predictive insight, our client team would pick which customers they should proactively talk to first.

Using the historical customer records and the knowledge of our business experts, within two months we developed a decent ML model. Next, we needed to decide how often we should run our model to generate predictions. We had two options: (1) real-time inferencing and (2) Offline / Batch inferencing. Which option was right for our AI use case?

Real-time Inferencing

In this strategy, as a new record arrives, the ML model instantly computes its prediction, which is delivered to the user right away. Real-time inferencing is the default choice when ML results need to be delivered immediately to a live user based on fresh data. For example, Uber Eats App likely uses real-time inferencing when recommends restaurants to a live customer using their current location, past orders, and the most recent search activities from the current user session.

For a real-time scoring solution, the model execution environment needs to be online 24x7. This solution provides speed at the expense of a 24x7 ML infrastructure cost.

Batch Inferencing

With batch scoring, at a fixed time interval (e.g., hourly, daily, weekly), the model receives a batch of observations and computes their predictions at once. Next, the scoring process saves the predictions into a database, for serving the users at a later time. If the business doesn’t need to act upon the predictions immediately, batch scoring can be the right option.

This scoring approach doesn’t need a 24x7 ML runtime; it needs the ML runtime only during the execution of the batch job. Batch scoring can save on IT costs for the organizations.

How did we choose the right deployment option for our ML Model?

We asked our stakeholders the following few questions to help pick the right inferencing strategy for our ML model:

a. How frequently is the model’s input data updated?

Our model was generating predictions from a database. The database administrator told us that they ran their script once per week to bring new customer records into their database. This means that input features to our model won’t change for the seven days period between weekly database updates. From this knowledge, we concluded that for any customer record, if the model would generate prediction multiple times during a week, between weekly database updates, the prediction would remain the same.

b. How will the users consume the predictions?

Next, we wanted to find out how many times the users would review the model’s prediction on any customer during the week. We spoke to several members of the client team, each of them was handling relationship with a subset of the customers. Every morning they would review the list of the customers assigned to them and pick one or a few customers they wanted to focus on during the day. At the time, they used their intuition, e.g., revenue attached to the customers, to make this selection of customers. After the integration of our ML model with their dashboard, they would use the model’s decisions — predicted non-renewal risk — to help with selecting the customers to focus on each day. Throughout the week, several times they would review our model’s decision on each of their assigned customers.

From the above insights, we figured out that our model’s decisions on the customers would remain constant during the seven days between weekly database updates. Also, the users would review the model’s decisions on customers multiple times during the week. We concluded that our AI use case didn’t need a real-time inferencing solution. An offline batch scoring inferencing solution was the right solution for us. We created an inferencing process that would compute the predictions for all customers soon after the database refresh, save those predictions into a database table, and later serve these predictions to the users during the week. With this solution, we avoided redundant computations with our ML model and saved on IT costs with an offline model deployment.

Fast vs. Slow: Which Lane Should You Pick for Delivering AI to Your Business Users?

Real-time Inferencing

Batch Inferencing

How did we choose the right deployment option for our ML Model?

Written by Shaikh Quader