Computer Vision Pipeline with Kubernetes

Sylvain Gavoille
Published in
4 min readMar 25, 2022


namR enriches the knowledge of French buildings with Computer Vision. We produce a multitude of attributes (characteristics attached to an entity — building, parcel, etc.) using various sources such as aerial imagery. The idea is to build Deep Learning models from a few thousand buildings using in-house-tagged labels or existing labels from open data. In a second step, the models are deployed on the whole French territory, which represents more than 35 million images to process (i.e. 4 TB of data to deal with). This second step is the focus of this post.

The challenge is to be able to infer at low cost and in a short amount of time, (less than a day). To do this, we chose to orchestrate our inference pipeline using Kubernetes on Google Cloud Platform (GCP). Integrating our inference pipeline into a cloud solution like GCP was a no-brainer. The multitude of services available through cloud solutions allows us to save a lot of time in the development of our solution — including access to buckets for storage, Google Kubernetes Engine for the management of our Kubernetes cluster, Big Query, Memorystore and Cloud SQL for our databases, Google Container registry for container images, and many others.

The inference pipeline

The main idea of the inference pipeline (see Fig. 1) is the following:

  1. Consider the geometry of a building or a parcel of interest attached to an address. Geometries are derived from open data provided by the IGN (France’s National Geographic Institute). The attachment of the geometry to the address is done using a tool called a geocoder
  2. From the geometry, we search our image database, via a VRT, for the image surrounding the geometry. This construction step was covered in a previous post (in french)
  3. We then process the image using algorithms such as our Deep Learning models which return either a geolocated vector (in the case of segmentation, as in the example below on the prediction of roof sections), or a label in the case of classification.
Fig.1: Inference pipeline for the roof slope model

From this simple illustration, we can see that two initial services emerge: the crop service for extracting the image and the inference service for extracting relevant information. In addition, there are two other services, one to orchestrate the selection of geometry and the model to be used, which will be specified by the user through an interaction with a SQL database, and another to store the attribute predictions in a Big Query database, which will be after post-treatment delivered to the customers. All these services are implemented in Python and containerised on Google Container Registry.

Orchestrating with Kubernetes

As seen above, we have a multitude of services that need to communicate with each other, to scale up and down according to the load, to be deployed on suitable machines and to interact with different GCP services. The obvious solution was to use Kubernetes as a tool to orchestrate the containers and interaction with GCP services.

In terms of cloud services, we use the following databases:

  • A PostgreSQL database, for storing geometries and associated metadata via cloud SQL, and for triggering the inference pipeline;
  • Redis Memorystore, a queuing system between the different services in the cluster, in order to have low latency and high throughput access to data;
  • Big Query, for the storage of millions of predictions of different labels or geometries. One can easily create an external connection to the PostgreSQL database to readily attach the predicted attribute to the entity;
  • A Cloud Storage bucket for images through the VRT mechanism.
Fig.2: Inference pipeline

For Kubernetes management, we use the Google Kubernetes Engine service, which allows us to easily specify the type of machine and the security of access to our services by specifying different nodes. They are attached to the different Kubernetes Jobs and Deployments. In particular for the inference service, we use machines with GPUs to efficiently infer our images via torchserve. To give an order of magnitude, it is considered that the calculation time is divided by 40 on average, for a CPU of the same generation.

For all deployments, such as crop service, BQ writer and inference, we can enable autoscaling of nodes to use only the resources needed at the time the inference is triggered. Concerning the monitoring of the load on the redis queues, we use KEDA which allows us to easily scale up and down the different services.

Key takeaways

The infrastructure put in place makes it possible to process 4TB of image data in less than 24 hours, using 14 GPUs for inference. This represents an average of about 450 images per second for the models. All this costs about $300 to perform an inference on the whole of France.

Beyond that, the implementation of a Kubernetes infrastructure organized in the form of micro-services allows easy maintenance and simplifies the allocation of computing resources.

This infrastructure development work helps us deploy our algorithms more safely and automatically which in turn makes it easier and faster to produce meaningful data. Stay tuned for other posts on this topic.