Best practices @TDF: Basic models to accelerate the machine learning delivery

Benjamin Goehry
TotalEnergies Digital Factory
11 min readJun 16, 2021

with Alaa Bakhti and Yann Hal

When developing Machine Learning powered products, our goal is not to provide a proof of concept of Data Science features, but to deliver a product that people will be using in their daily work.

Delivering Machine Learning based applications has some specificities compared to traditional software delivery. In fact, estimating the time needed to develop a model for an ML functionality is hard, because finding the model that will reach the expectations is uncertain (e.g., 90% of recall). This is due to the fact that model and features to achieve this performance; if reachable; are unknown. However, estimations are of great importance in defining the product road map and having visibility on the required time for feature development. Therefore, we need to find a way that will help us do it.

Another particularity of ML delivery is that the changes that we need to manage are not only in the code, like in software delivery, but also in the data and the models. After deploying an ML model in production, its performance may decay over time because of data or concept drift. As a result, this model should be updated or replaced.

During the past years, the expertise, through methodology and best practices in software delivery, has gone further (e.g., Agile, DevOps). Since the delivery of ML powered products has some specificities, these practices may be used but some of them need to be adapted or replaced. In this article, we present one of these practices that will help us accelerate the delivery of ML products. It will be followed by a series of articles where we cover some best practices of ML delivery.

Applying Pareto principle, deliver 80% of value with 20% effort: deploying a simple model with 60% performance in one week is better than a non-deployed model with 99% performance in 4 months. The most important point is that the “simple model” is deployed. In an emerging auto industry, this would be equivalent to ‘build the car that accelerates the fastest’ but not build the road to test it on. One day, roads will be required and this day it might appear that roads need turns. At this point, people will realize that the car lacks directional wheels and that the product does not meet specifications. It’s a metaphor but, The most important point is building infrastructures to collect feedback from the start.

CIOs and IT leaders find it hard to scale AI projects because they lack the tools to create and manage a production-grade AI pipeline. (COSTELLO K. & RIMOL M. — Gartner)

End-to-End Machine Learning workflow

Building infrastructure

One of the failures in ML projects is spending too much time training and/or fine tuning a model in the laboratory at project early stages. Since the model artifact is a cornerstone of an ML project, a bottleneck is bound to be created.

Traditionally, before going into delivery, data scientists spend a lot of time on the modeling phase: visualization, exploration and training until reaching the desired model performances. However, reaching these performances usually takes months but no value is generated during this time, the model still being under development. It also puts the whole project at risk: the other members of the team need to know what the model ‘looks like’ in order to build the other parts of the machine learning pipeline. Waiting for the perfect model to deploy also prevents the team from gathering feedback on the model in production conditions, the deployment pipeline, the inference pipeline, production data quality or even from users. Furthermore, most of those feedback may lead to a redefinition of the MVP objectives; The need of the user may change, new feature ideas may appear, or the predictions may be unactionable, e.g., the application must send a prediction every five minutes, but we find out that the model takes ten minutes to compute one.

Focusing on the model building and experiment part instead of deploying the generated code may lead to technical debt, more issues to tackle. This binding implies rewriting scripts to go in production, changing real time data to workflows and integrating the project in the actual ecosystem.

Another thing to keep in mind is that reaching the desired performance in the laboratory does not necessarily guarantee the same results in production. Thereby, the user can also lose any confidence in the use of data tools and it may be hard to convince the Product Owner to develop continuous delivery, monitoring or further model improvement because of the time already spent.

In a few words, having a model in production as fast as possible should be a priority. And it doesn’t have to be sophisticated, we can start with a simple one.

Simple model

The definition of a simple model is highly subjective. In our case, we consider simple, a model that does not take too long to implement, typically not more than a few days and that does not require any (or very little) exploration or model training. Simple models can be split into two categories: rule-based model and heuristics-based model.

A first simple model that can be considered in a Data Science project is a business rule: a rule that is already used and generates business value. By this definition, the Data Scientist will not have to spend any time on data visualization, exploration or training to get a first working model. This model can be used by the team to put in place the infrastructure, for the same reasons previously explained, and still have a good model that is generating value from the start.

A second category of simple models are heuristic-based models. A heuristic technique is, according to Wikipedia, “any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, short-term goal or approximation”. In weather forecasting, for example, predicting that tomorrow’s weather is the same as today’s gives a high accuracy (at least in some countries). Yet simple, it is accurate and straightforward to implement and does not require any particular visualization, exploration or training. Not perfect, nor optimal, it reaches however a short-term goal, a happy medium so that the infrastructure can be furthered around it, as with the rule-based model.

A question may arise on the difference between a rule- and a heuristic-based model. Both may indeed be the same. In this article, we differentiate a rule-based model, already used in the business and already known to generate business value, from a heuristic model, that did not show any performance yet. Sometimes however, the rule-based model is too sophisticated and we need to switch to a simpler one. The user may use a set of highly complex rules: the business may exist for several years and has incremented the set of rules over the years leading to more and more complex rules. In this case, it would take too much time to implement them and remove the purpose of approaching a new project with a simple model to fasten the delivery. If this were to happen, we may either restrict the set of rules and select only a few simple and relevant. Or use a heuristic-based model instead, that may be inspired by the set of business rules.

Breaking News: VADER takes off

In the following, we present a practical use case (called VADER) for gas and electricity consumption and carbon footprint management we had at the Total Digital Factory. During the development of this use case we used simple baseline models at the start of the project.

Refineries need to forecast their gas and electricity consumption for two reasons. The first one is due to the so-called gas delivery capacity, that can be seen as the maximum amount of gas a refinery can consume a day. If a refinery consumes more gas than the capacity for a given day, it will have to pay huge penalties. However, the energy coordinator can buy extra-capacity before 2p.m. the same day to avoid penalties if he knows in advance that the refinery will consume more than usual. The goal of the use case was to inform the energy coordinator each day before 2p.m. if he needs to buy more capacity, how much and the penalty amount at risk. The first baseline (and the only model we kept for this purpose at the end) is to take the mean consumption of the last six hours and multiply it by 24 to get a forecast for the day. If this forecast is greater than the capacity, we email the user.

Refineries also need to send next day energy consumption forecasts to the energy providers. Here the goal is to inform the user each day at 10a.m. of this forecast. For a long time, these forecasts were determined by the coordinator’s expertise based on the consumption’s history and information he had about the refinery’s operations. The use case aims at automatizing this procedure by a data-driven approach based on machine learning models. However, before using ML models we deployed a simple baseline, that is tomorrow’s energy consumption is the same as yesterday’s. Using this baseline already improved business performance metric by 10% from the hand-made forecasts. This example of baseline as well as the one for the capacity alert required only a few lines of code, but allowed a quick deployment of a working product and quick feedback from the users.

When no business rules, nor heuristics can be developed, a third kind of simple models can be developed: The Dummy Model. This kind of model makes predictions based on really simple rules, as the name suggests. In regression, they return the mean or the median of the training set for example. In classification, they return the most frequent class or always the same label for example. Below a code example for the DummyRegressor from scikit-learn. The classification works the same way using DummyClassifier from the same library.

There is, in contrast to the two previously mentioned kinds of models, not much hope to generate business value for the project with a dummy model, even though it may be used as a baseline that does better than pure luck. However, the goal mainly being to develop the infrastructure of the product can be satisfied and should be considered as a last resort before diving in more complex models.

On the added value

Deliver 80% of value with 20% effort: a simple deployed model with 60% performance in 1 week is better than a non-deployed model with 99% performance in 4 months.

We presented three kinds of simple models. Starting a Data Science project with these models is motivated by plenty of immature and unfinished Machine Learning products (according to Deborah Leff, CTO for data science and AI at IBM, said on stage at Transform 2019 or Gartner showing that 47% of projects remain prototypes). The barriers in the delivery of Machine Learning products are numerous. They lead to loss of money, of confidence from people and businesses that are not accustomed to the Machine Learning sphere. However, using simple models from the beginning, we may have a new look on Machine Learning products, providing value from the start of the project.

Regarding the business value, the simple models may not give the optimal performance, but may still give a significant proportion of what the user expects.

The business value obtained is one thing, but the added technical value is even greater. Once a first model is developed, the team can build the whole infrastructure needed to deliver the final product. The user can have a sense of what the product will look like, and can give feedback. Whether on the performances or the associated metrics, features he initially wanted that he does not want any more or new features he now wishes for, having a better product vision.

Finally, a Machine Learning Model needs to evolve. The data change over time, the models need to be monitored, re-trained, or changed. This was mentioned in the introduction, and there will be an article focusing on these issues. To put it simply, models need to be continuously evaluated and compared to baselines. Having the infrastructure in place, we can use the simple model as baseline comparison for future models and ease the detection of changes.

VADER: the models have been improved by iteration. Starting with the baselines, we ended up with neural networks significantly improving the performance.

Mock model

Having a simple model ready to be pushed in production allows us to start creating infrastructures. But there is a case, not so rare, when the team has only access to a sample of data and full access is delayed. It is a perfect example in which building a simple model is relevant. But building a simple model is not the catch-all and it moves the bottleneck from ‘waiting for a good model” to “waiting for the data to come”… However, the data preparation pipeline is not required to set up all the inference infrastructure.

Mocking a model avoids waiting for data, de-risk the implementation and enables the automation of some of the model pipelines (e.g., inference, serving, integration). But first, what is a mock? In a few words, a mock is a fake object that simulates a behavior, a model behavior in our case. It is particularly useful to simulate an object if its behavior is likely to change (e.g., adding methods, integration tests).

Since a model, in a pipeline building point of view, is a contract between the data preparation pipeline and the inference pipeline, we can simulate an object which will return a randomly filled object corresponding to the expected format (e.g., a binary vector).

Since we can simulate the output of a model, we isolate the inference pipeline and the following from the need of data, and from that point we don’t need to wait for it to start building them.

An easy way to mock a model is to use Base Estimator from scikit-learn.

Takeaway

In this article, we presented some tips to accelerate the delivery of Machine learning products and to generate value from day one:

  • Instead of starting with a long model building step (especially when using deep learning) at the start of the project, prefer deploying a simple model (e.g., business rules, simple heuristics) as fast as possible to get both business value and users’ feedback.
  • De-risk the implementation and automation of the model infrastructure (e.g., inference pipeline) from the start of the project to accelerate the deployment of new models. As a result, we get feedback on the new models as early as possible. To do that, we only need either the simple model that we already put in place in the 1st step or a model mock.
  • The model building and improvement should follow an iterative process.

These practices will make users experience the model improvements, acculturate them on ML powered products specificities and finally and more importantly have more confidence in the product and the delivery.

--

--