How did we put our sales forecasting solution for croissants into production?

A data science journey, from notebooks to a deployed product — Part II


PART II: Deploy and scale


What is it?

At Artefact, we are so French that we have decided to apply Machine Learning to croissants.
In this second article of the series of two, I will dive into the deployment and the maintenance of our models into production. If you missed the first one about data crunching, feature engineering, cannibalization and our favorite model Catboost, here is the link.

We will talk about some best practices in MLOps such as CI/CD, reproducibility, monitoring and maintenance. Finally our choices in terms of pipeline orchestration and the tools we chose within the GCP ecosystem.
This article’s goal is to share an end to end feedback on how we deployed a ML model in production and give you some tips based on real life projects in order to help you to avoid the same mistakes we made and speed up your deployments.
I hope you will like it, enjoy the reading!

For who?

  • Data Scientist, ML Engineer or Data lovers


  • Good practices in term of MLOps
  • Some tips, tricks to help you to save time and deploy quicker your models
  • Nice libraries, tools we discovered during our journey

Machine Learning Operation or MLOps

Some definitions

The concept of deploying a model into production may have many definitions. Here we are talking about the process of switching from code executed in notebooks to a fully automated version which will automatically update your data, train models and infer predictions. The deployment also includes how we exposed the results in order to allow the end user to use them.

From notebooks to python scripts

As mentioned before, the first step is to switch from notebooks to python scripts. It may take some time depending on the quality of your notebooks and your code.

As a result, here some good practices to accelerate this operation:

  • Develop a linear notebook, you shouldn’t have to run cell 4 before cell 3
  • Use proper naming for your variables and your functions from the beginning
  • Package your code in functions as soon as possible and avoid redundancy
  • Don’t be shy to use logging or decorators. Check out scikit-lego which has pretty cool features and decorators.
  • Add markdown titles and subtitles, it will make the notebook more readable
  • Import all libraries at the beginning of your script and use virtual environment
  • Avoid hard coded paths, use instead libraries such as pathlib

During this process you will likely ask yourself, should I package my code into classes, only functions or a mix of both? Some part of your code would definitely need to be included in classes like your database connectors for instance, for the rest I don’t have a strong opinion on the subject.

Nevertheless try to avoid writing long scripts, if you do, try to break it into different files or factorize it.

Finally, even if you are in a rush, take the time to write your unit tests as soon as possible! It will help you identify hidden errors and save you a tremendous amount of time when factorising, optimizing your code.

A small section about debugging. Data scientists like to work with notebooks because it is easier to debug your code, inspect your variables, plot things, etc. When you transform everything into python files debugging will become a bit more challenging. Here are some tools useful for debugging:

  • Pdb helps you to inspect your code on a specific location
Import pdb; pdb.set_trace()
  • The rich library, which also has a Traceback functionality. Here is a quick demo of rich by

Another question that will come up while transforming your notebook is the architecture of your repository.

Obviously everyone will have their own preferences or sometimes the architecture will be imposed by your company. But you can find some guidelines or templates such as cookie_cutter or the Kedro library.

Finally I will point out the importance of having proper documentation. A proper documentation starts with clean code, someone reading your code should be able to understand most of it. We created a documentation folder containing an overview of the project, schemas of the workflow, explanation of our ETLs, etc… It may also be interesting to add your data science work, what worked or didn’t work, as a result if someone takes over the project he will have an overview of what has been done.

Reproducibility and versioning with ML Flow

One of the biggest challenges when industrializing your Machine Learning solution is reproducibility: given the exact same raw input your reproduced model will give you the same output. More importantly, reproducibility is crucial to check if the deployed model is identical to the one developed in your notebook, to check if no errors appeared during the industrialization.

There are different ways to approach this challenge:

  • Set a Seed while training your ML model or while using functions having a random composant
  • If you interact directly with a data warehouse, make sure the data will not change overtime or make an extract into a flat file

Be careful if you use GPU, perfect reproducibility is not possible so one option is to disable GPU, check that the outputs are matching and then switch it on again.

Reproducibility while working with databases… It is really hard to have perfect matching with your Prod, Pre Prod et Dev database. You may change some ETL logic to a point, fix bugs, make a partial or full reload etc. One way to address this challenge was to create a dashboard comparing the most important tables for our databases:

  • Number of rows
  • Number of NULL values per columns
  • SUM(), AVG(),MAX(), MIN() of some important columns such as: Revenues, Qty_sold, is_in_promo, price, etc…

It is definitely not a perfect solution but better than nothing and already gives you good insights and help.

Finally as explained in the previous article, we used ML Flow as a tracking experiment tool. The final check of reproducibility was to verify that our model trained .py files (industrialized version) had the same results than the ones trained in notebooks. The ML Flow open source project also provides tools to deploy ml models and a central model registry but we didn’t use them during this project.

Why having a CI/CD pipeline is important

As mentioned before, you should as soon as possible test your code. But let’s start with why you need to test your code:

  • Helps you to spot errors in your code
  • Forces you to think about edge cases, how is your code going to respond to them?
  • Helps you when refactoring your code
  • Makes your life easier when deploying a new model or feature
  • Facilitates reproducibility
  • Helps you to point out any code recession
  • Spend less time doing repetitive tasks

If you don’t know where to start, check out libraries such as pytest and notions like Unit Test, Magic Mock, Integration Test, Qualification Test, Patching.

Our CI/CD was split into three different parts:

1. Continuous Integration

It consists in integrating changes made to the code on a continuous basis, in order to immediately detect and correct any errors. It prevents integration problems that may come due to the development of multiple developers at the same time. This part includes the unit tests and the linter. Having a linter forces you to write clean code and allow a unify code within the team.


2. Continuous Delivery

Closely related to Continuous Integration and the continuous delivery refers to keeping your application deployable at any point. We will test our application on a higher level, qualification tests for instance can check if our solution runs end to end or that we don’t encounter any model recession. It is strongly recommended to run these tests on another equivalent environment, automatically created for these tests.


3. Continuous Deployment

Finally, after having checked that our release was stable, coherent and passed all the tests, we need to make a switch with the old model. That is what continuous deployment is for: being able to continually deploy. This automated process will allow you to put into the production environment the latest release. This may include additional actions such as updating the production infrastructure.


These three different steps can be automated and managed by tools such as Jenkins or Cloud Build.

Coding is a collaborative activity. You need to have an automated pipeline to merge the latest modifications of the team and deploy them into production quickly in order to be efficient and avoid errors.

Here an illustration of our CI/CD pipeline:

I explained in my previous article that we were working in feature teams. In a feature team combining Data Scientists and Data Engineers, you may wonder who should write all this? My personal opinion is that a developer should write their own unit tests whether he/she is a Data Scientist or a Data Engineer. Concerning the CD, it should be thought of by the whole technical team but Data Engineers are more likely to have the proper skills.

Logging & Monitoring with Stack driver and Great Expectations

One of the key successes of this project was the adoption of our new solution. The operatives needed to believe, trust our predictions and apply them. We demonstrated that our solution was better than the old one based on historical data but what if the model is deteriorating over time? How could you spot that and react? Monitoring will help you to tackle this solution.

But what and how should you monitor your product?

Right after we deployed our first model, we built a dashboard allowing us to monitor our solution. Here is a non exhaustive list of our most important metrics:

  • Execution times: all workflows, updates of the databases, data preparation, model training, etc.
  • Timestamp of the end of the workflow
  • RMSE train and validation datasets during weekly updates of the model
  • Daily Forecast Accuracy of our predictions
  • Number of warnings raised during the workflow

These metrics are calculated on a daily level and plotted as time series as a result it is really easy spot anomalies.

Most of these metrics were stored as logs split in two categories.

The first category of logs inform you of the task running, and some useful information such as: Data Preparation or ML Inference. The second ones are stored in Stack Driver, a logging tool from GCP. We can then filter the ones we want and export them to Big Query thanks to a sink functionality. From here, we are able to easily create a Dashboard, with tables automatically updating during the run of our scripts.

To spot data anomalies, we also added up a module checking the consistency of our data during the workflow. Module based on the really cool library Great_expectations (you should definitely check this one out), a library that allows you to monitor your data quality and to create automatic reports: if some features have values bigger or smaller than a threshold, or have missing values, etc. You can trigger it during your workflow or even with github actions.

Let’s look at an example to illustrate why data monitoring is important. What if for some reason a new process is added, or a new way of collecting the data is developed, and we are not informed. It may impact the distribution of our data such as the price and we might not spot it! One of our most important features is the price. If suddenly prices go up significantly, predictions will be highly impacted and it will certainly lead to poor predictions. As a result, having this module will help you to identify data changes and allow you to react before your model deteriorates itself, or even crashes.

Finally we also deployed on Kubernetes a dashboard with Grafana based on Prometheus in order to monitor our computing infrastructure and clusters. It is not a vital step at the beginning of a project, but will be definitely useful in the long term for product maintenance and monitoring KPI such as RAM, network pressure, etc…

Pipeline orchestration with Kubernetes

There are many possibilities, approaches to deploy and orchestrate your solution such as AirFlow, Luigi, Cloud scheduler, Google Kubernetes Engines (GKE), a simple cron tab on a virtual machine, etc. We decided to go for GKE, the managed version of Kubernetes on GCP. For those who don’t know Kubernetes, it is an open-source container-orchestration system for automating application deployment, scaling, and management. With Kubernetes we were able to control our compute power with an auto scaling of our resources depending on the task. We decided to go for GKE mainly due to company policy but also the ease to manage our compute power and thus the pricing.

As a result, we deployed different docker images and thanks to cron job on GKE we could start and run the containers associated.

Small tips, be careful when you set up the starting time of your cron, the time indicated may not be the one of your time zone but the one where your computer is located. In addition time change will alter the time when the execution starts. Take this into account in your deployment policy.

Results exposition with Rest API

Our solution only becomes useful when the end user can have access to it. As mentioned in the previous article, the client already had a forecasting tool in production and realized any changes on it had obviously a cost. In order to build a new robust solution you need to bench it as soon as possible with the end users. Adding our solution to the existing tool would demand a lot of IT resources, cost money and we were not even sure it was properly working.

As a result the first step was to send daily mails to the managers with our forecast saved in excel sheets. This work around solution was perfect for us because we didn’t depend on any other teams, easy to implement and thus allowed us to test the solution quickly.

However, excels can’t be a long term solution. We wanted a tool, solution that can be accessed by any other teams, departments and compliant with security and SLAs. As a result, the final solution was a Rest API which allows any application to query our predictions and allow us to manage access.

Finally we integrated our Rest API into the API Manager of the company which allows a unique access to all the different available APIs and handles authentication, TLS. The advantage of this approach is to split the deployment of our model and the exposition layer making it easier to manage the SLAs and security.

Key Takeaways

  • Take the time to write proper code from the beginning, especially in notebooks
  • Test your code, unit tests are not optional
  • Reproducibility is a corner stone for ML deployment
  • Useful monitoring is key for long term projects and maintenance

To conclude, I would like to point out that the success of this project was first and foremost a team effort, we succeeded in mixing the skills and profiles of business specialists, product owners, Data Scientists and Data Engineers!

Thanks for reading!

You made it this far, I hope you had a great time and learn something!
To know more about us you can check our other articles here or check our open source projects.
If you have any questions, or points you want to discuss / disagree with, I would love to hear from you.