Using Kubernetes Init Containers to decouple the deployment of machine learning applications from their models
--
A couple of days ago, I worked on a Proof of Concept (PoC). It was basically a web service which needed to use a machine learning model for word embedding, in order to perform its job. Thanks to one of my colleagues, a first version of the PoC was almost implemented and it already worked in our local development environments (i.e. our laptops). The machine learning model files were directly inside the project directory and therefore the code was implemented to read the model from the local file system. So, both the model and the application were in the same place.
Now, since the PoC had to be used by some users during an event, the goal was to deploy it and expose it to the world. Because it was a PoC, the simplicity and the fast development were the main drivers for the deployment. In other words, the goal was to deploy the entire project somewhere as it was, without any change to the code or other configurations overhead. I think Kubernetes is great for that because of its portability and configuration reusability.
So, after pulling the first version of the PoC project from our code repository, I built a Docker image with all the needed project files (including both the code and the model files), I set up a new Kubernetes Cluster in Google Kubernetes Engine (GKE), I wrote the proper Kubernetes resources files (in this case a Deployment, a Service and an Ingress) and I ran the deployment pipeline. Then, everything was up and running on Kubernetes without any change to the code, ready to serve the entire world over HTTPS and our custom domain name.
The entire setup took me less than 30 minutes of work. Awesome. However, there was something I didn’t like. I didn’t want to keep both the model files and the code inside the same Docker image. It seemed a bad method to me and moreover, since the size of our model was a couple of gigabytes (but a model could also easily reach higher sizes), every time I made a change to the code (even for a slight change), I had to build, push and deploy the entire image to GKE. This would be in contrast with the goal of fast deployment and, more in general, with any CI & CD strategy. It seems that we can do better, doesn’t it?
Thus, I decided to try to decouple the application from its model files, but still without changing the code or adding any configuration/operation overhead. I found the answer in Kubernetes Init Containers and this is what I want to talk about in this post. Note: I’m sure that there are also more sophisticated solutions (e.g. Google ML Engine, Kubeflow) but in this case the requirement was to minimize the changes to the PoC code and the time spent on the deployment as well, in order to meet our deadline :)
Purpose of this post
To sum up, in this post I’d like to share a straightforward guide on how to effectively deploy an application which uses a machine learning model. In particular, we’ll leverage Kubernetes Init Containers for decoupling the deployment of the application from the deployment of such model.
Prerequisites
In this post I won’t start from scratch, so a little familiarity with some concepts is required. In particular, the prerequisites are:
- Familiarity with Docker and Kubernetes
- Basic knowledge of Google Cloud Platform (GCP), mainly about Google Kubernetes Engine (GKE) and a few commands of gcloud
- Docker installed locally, a GCP account with a GKE cluster already set up
Note: In this post I’m going to be using GKE and, more in general, the Google Cloud environment. However, you can follow the same steps with any other Kubernetes environment.
Kubernetes Init Containers
So, let’s begin. First of all: what are Init Containers? As Kubernetes Official Documentation states, Init Container are specialized Containers that run before app Containers and can contain utilities or setup scripts not present in an app image. Init Containers are exactly like regular Containers, except:
- They always run to completion
- Each one must complete successfully before the next one is started
Based on this, here are the steps we can follow to decouple the application from its model:
- Package the application and the model in two different images using Docker
- Push them to Google Container Registry
- Configure the pod initialization in order to load the model files into the application container
1) Create the two Docker images
We use two different Dockerfiles. The first one is used to build the image of the application without the model files (we ignore them using .dockerignore). The second Dockerfile is used to build an image containing only the model files. Here are the details.
1.1) Application
The application is implemented in Python using Flask as the web framework. Therefore, our first Dockerfile is very simple and similar to any Python application which exposes a service over a specific port (in our case, port 8000).
Important: remember to avoid to include the model files by ignoring them. You can simply define a .dockerignore file containing the model path (in our case, models/).
Now you can build the application image by running:
docker build -t gcr.io/<gcp-project-id>/faq-engine:latest .
1.2) Model
The Dockerfile which builds the model image is pretty straightforward: it starts from Alpine (a minimal Docker image based on Alpine Linux) and it just copies the local model files (i.e. local folder /model) into the container directory /usr/src/d2v_model. That’s it.
Again, you can now build the model image with the command:
docker build -t gcr.io/<gcp-project-id>/doc2vec-model-it:1.0 .
2) Push the two images to Google Container Registry
So, we built our two Docker images locally. You can check if you actually have them with the command docker images. Now we have to push them to Google Container Registry. You can do this with:
gcloud docker — push gcr.io/<gcp-project-id>/faq-engine:latest
gcloud docker — push gcr.io/<gcp-project-id>/doc2vec-model-it:1.0
Now you should see the two images in Google Container Registry.
3) Configure pod initialization
This is the core part of this post. We want to configure the pod initialization in order to load the model files from the model container to the application container, before the application container starts. This is done simply by defining a proper deployment.yaml file for Kubernetes. Here is a sample file that leverages Init Containers.
The main parts of the file are:
Volumes (rows 15–17): we define a simple volume called model-volume. emptyDir is a particular type of volume that creates an empty volume when a Pod is assigned to a Node, and the volume exists as long as that Pod is running on that node.
InitContainers (rows 18–26): we finally use the Init Container. With such configuration, we are basically asking Kubernetes to:
- Pull our model image from Google Container Registry (the one we pushed before)
- Mount the above-mentioned volume on /models
- Execute a command which copies all the files contained in /usr/src/d2v_model to the folder /models (which refers to the mounted volume)
At the end of this process, we basically have our model files inside the mounted volume.
Containers (rows 27–57): this is a common spec in a deployment file. It represents the specifications about the application containers. The only part related to our topic is the volumeMounts field. As we did for the Init Container, we are just mounting the same volume to the container of our Python application, which is already implemented to read the model files exactly from the directory we mount the volume to (i.e. /usr/src/app/models).
Now, you just need to apply the deployment.yaml configuration to your Kubernetes cluster and see the magic. Just use kubectl apply -f deployment.yaml in order to do that.
Here are some commands and their outputs useful to see what Kubernetes is doing after we applied the configuration. As we can see, in less than a minute the application is up and running.
Next steps & improvements
What if we wanted the model files to be retained indefinitely, no matter what happens to a pod? Try to use Kubernetes Persistent Volumes.
As opposed to non-persistent volumes (like the emptyDir volume that we used above) which are linked to the pod’s life cycle, persistent volumes continue to exist even when the pod is deleted. They can also initially contain the data needed to run applications from the containers and you can share such data between several pods, which is something that could be useful to retain and share heavy machine learning model files.