Tutorial: how to deploy a great open-source labeling plateform, CVAT, on OpenShift?

Augustin Hoff
MAIF Data Design Tech etc.
5 min readMar 2, 2023

CVAT is one of the best, if not the best, free open source tool to annote images and videos. And understandably so, because in addition to being backup by a big company, Intel, it supports all majors annotation use cases, it offers a nice UI with some degree of authentification and it also enables you to speed up your work with nice features such as interpolation or semi-automatic annotation.

The code makes the installation possible primarily with docker-compose although some basic helm-charts have been created since v1.4.0. Here we consider the version v2.1.0.

The goal is to show you how to modify the base images and helm-charts to make this tool work on a corporate OpenShift platform. Indeed it is best practice to run pods without root privileges and CVAT isn’t compliant with this requirement at first.

An overview of the product services

Contrary to what the documentation implies, an ingress is mandatory. It balances the network to the correct services (front and back) and exposes the necessary routes. A Redis is used for caching and a Postgres is used to save users data in particular. The latter is better hosted externally to ensure persistant data. You might mount an external cloud storage to access easily your data to annotate but this will be for another post. In any case you have a number of environment variables that need to be specified in the backend image (cvat-server) such as CVAT_POSTGRES_xxx or UI_HOST/PORT.

It looks like this :)

Cvat Overview

The essential modifications!

The main idea is to allow a random uid non root user on OpenShift to run the services.

Cvat-server

The key here is to:

  • launch the supervisord entrypoint without the root privileges
  • make it possible for the non root user to write temporary files and logs in specific directories such as /tmp/supervisord/ or /tmp/components.

In order to do so we allow all users within the django group to write in these places. This gives us the following Dockerfile:

Then we make sure to launch our pod with the correct gid by adding in the cvat-backend deployment.yaml file the runAsGroup key within the securityContext which should have 1000 here as value.

Indeed the default user (django) created in the base image has gid=1000:

uid=1000(django) gid=1000(django) groups=1000(django)

Cvat-ui

Here we want to use an unprivileged nginx version to run this frontend pod. We pick nginxinc/nginx-unprivileged which includes the following changes:

  • The default NGINX listen port is not 80 anymore
  • The default NGINX user directive in /etc/nginx/nginx.conf has been removed
  • The default NGINX PID has been moved from /var/run/nginx.pid to /tmp/nginx.pid. Change _temp_path variables to /tmp/

Instead of building again the image from scratch we copy the necessary conf and data from the official CVAT image to the unprivileged nginx image, the owner staying the default user nginx (gid=100).

This gives us the following Dockerfile:

An important thing is to notice that the nginx server of CVAT is then running on the port 8000 instead of the 80. This has to be specified in the OpenShift ui service to map the correct ports. Carrying out this change in the overall schema:

Again I would advice to run the pod with the gid of the default user, 100 for this service. The script 10-listen-on-ipv6-by-default.sh needs it to complete otherwise you might see this message in the logs:

10-listen-on-ipv6-by-default.sh: info: can not modify /etc/nginx/conf.d/default.conf (read-only file system?)

This isn’t critical to launch the nginx app but it’s better if everything is running as expected.

Redis/PostGres

Nothing special here. Plenty of examples exists on how to deploy a Redis on OpenShift if needed. The PostGres might be externally hosted (in a virtual machine for instance).

OPA

The Open Policy Agent enforces some rules across the stack (access to ressources, rights to perform some actions, …). These rules are defined here and they should be exported using the specified command inside the helm-chart directory:

find ../cvat/apps/iam/rules -name "*.rego" -and ! -name '*test*' -exec basename {} \; | tar -czf rules.tar.gz -C ../cvat/apps/iam/rules/ -T -

For this service I would only recommend to add the resulting rules.tar.gz file as a secret in your namespace. The volumes part of the deployment.yaml becomes:

Ingress

This part might be specific to my organization but we are using HAProxy and not nginx. Current ingress.yaml file in the CVAT repository is intended for nginx but we can modify it quite easily to get there. First we have to remove all nginx annotations. Then instead of using regexes to specify the paths we will loop over the prefixes which are for the backend:

"/api" "/git" "/tensorflow" "/auto_annotation" "/analytics" "/static" "/admin" "/documentation" "/dextr" "/reid"

Adding this list of strings in the values.yaml (ingress.backend.paths for instance), the ingress template could look like this one:

Final thoughs

Looking at your exposed route you should be able to see the index page :)

Creating an admin user should be straightforward from there. You can even create it by launching the django command directly at the pod start if you specify your credentials in specific env variables: DJANGO_SUPERUSER_USERNAME, DJANGO_SUPERUSER_EMAIL, DJANGO_SUPERUSER_PASSWORD

python3 ~/manage.py createsuperuser --noinput

Hope this helps and good luck in your own implementation!

--

--