How to industrialize machine learning with classic CI/CD on-premise with Docker

Matthieu Paret
Lifen.Engineering
Published in
3 min readSep 4, 2018

--

Preamble

Lifen aims to facilitate the transmission of medical documents between the different healthcare actors : hospitals, clinics and doctors. To achieve this goal, we use machine learning to extract relevant information from the document, for example the sender, the recipient, the patient, etc.

Machine learning consists of building a model, based on previous data (ie: learning) and which will be used to predict things for future data. Our ML team builds applications handling the work of learning and making prediction from a previous learned model.

Since we handle healthcare data, we have, in France, a legislation (HADS) which currently prevents us from running machine learning workloads on cloud services (IaaS, PaaS or SaaS) as none of these are compliant with this legislation as of July’2018 (rumor has it that is going to change soon).

Even if we cannot use the cloud for the machine learning workload itself, we’ll see how we leverage a SaaS CI/CD to solution perform the several steps needed to serve ML predictions in production.

The steps

We voluntarily simplify the pipeline by not taking into account some steps.

Build base image:

We use Docker to build a base image from a new source code version. This image will be used to train the model and further serve predictions.

ML model training:

We trigger the learning which also saves the model generated. We use the previous base image built and run the python learning command inside it, using docker again.

  • As it requires docker running inside docker, we need to run our CI container in “privileged” mode. This is not always possible because it also introduces a security risk as it’s similar to running an application as root user on your server. We mitigate the risk by controlling and building the source of the application we run (i.e the base image aforementioned).
  • Also we often need a context and/or several others applications to train the model. To achieve this, we use docker-compose so we can embed others applications beside our ML code.

Build serving

We build a new docker image from our base image and copying the trained model inside it. The web app can now serve the trained model.

Deploy to production

We deploy this new serving image into our production using our classic deployment tool. We can easily update/rollback to any docker tags.

Re-training

As we build a base image from our base code, we can train it against new data and serve new model while using the exact same source code version.

We can trigger the training automatically with a scheduled CI pipeline or manually. We can do whatever the CI/CD system is able to do (calling APIs for example).

How to access data required by the learning step ?

This is the hardest part. The data we are using is private and sensitive so we need to run the CI/CD runner on our on-premise servers to properly define security rules allowing the learning application to access the data.

At the beginning we were not able to run the CI on premise and so the “go to production” was much more complicated:

This pipeline was involving 4 different projects: machine learning, scripts that effectively trigger the learning on our production server, 2 others projects to run this script. It was also involving interacting with our deployment tool and calling the CI/CD API.

All this complexity is now removed by running a CI/CD runner directly inside our servers. We are currently using GitlabCI runner. CircleCI, enterprise version, permits to get the runner on-premise too (but costs more $$).

NB: If you are using a framework designed to be industrialized (like tensorflow), you should have native tools for some of these steps.

Thanks to Felix Le Chevallier (our AI lead at Lifen) for helping me build these pipelines!

--

--