Deploying Valhalla routing engine on Kubernetes using Valhalla Operator

Itay Ankri
5 min readOct 21, 2022

Valhalla is an Open Street Map (OSM) compatible open-source routing engine. It is highly optimized and efficient in terms of runtime. As part of my job as a Software Engineer at Autofleet Systems, I designed and implemented in-house routing infrastructure. After some research I decided to use Valhalla for the job and I found this task to be quite frustrating.

The problem

As a Software Engineer that works with Kubernetes on a daily basis, I wanted to use it as our routing service’s infrastructure, but as Valhalla is very good at routing, its ecosystem is not so cloud-oriented.
Even though the project supplies a docker image called valhalla/valhalla, the way to use it is not so intuitive. After some searching I found an easy way to run Valhalla as a container! Another docker image called gisops/valhalla. All you need to do in order to run this image is supply it with a PBF file (A compressed OSM file). The container will extract the data out of the PBF and will pre-process it into a set of files that represent graph edges and vertices. The problem arose when I had to implement this solution in production.

I ran the container with the whole North America’s PBF. At first it worked smoothly — the container pre-processed the file for a few minutes and the pod started. But then I wanted to scale up my Kubernetes deployment to more than one pod, I noticed that the pre-processing phase runs again on each new pod, wasting time and resources; we are talking about ~20 minutes of pre-processing. This is not a pod startup time that can be tolerated in a production environment. I tried using SSDs instead of regular disks so the container will be able to load the PBF and write to the disk faster. It improved the startup time by about 50%, which is good, but still not fast enough.

The Solution

I realized that the problem is not the hardware. The problem is that the pre-processing phase occurs inside the container itself. So I created a new volume and mounted it to a Kubernetes Job that pre-precessed the PBF. Once the Job was completed, I created a new deployment, with the same volume mounted to its pods. By doing that I reduced the pods startup time to almost zero.

At this point I knew this is how things should work, but the whole methodology of creating volumes and manually scheduling the Jobs and the Deployments felt too clumsy. Then it hit me! Why not automate things and manage my Valhalla instances with a Kubernetes operator?

Valhalla Operator Introduction

Operators are software extensions to Kubernetes that take advantage of the fact that Kubernetes allows users to extend its API’s by creating custom resources. The rationale behind this concept is that if a user creates a custom resource, it might have a specific behavior that only the user knows. Therefore, it needs a custom controller to control the resource. Thats exactly our case here.

The added value of an operator in this case is:

  • Efficient deployment and compute resources utilization, just as described above.
  • It automates the whole flow. Instead of creating multiple Kubernetes resources and scheduling them manually, all you need to do is create a single custom resource.
  • Once all the resources are in place, the operator will constantly monitor and reconcile them if necessary.

Let’s have a quick demonstration

Step 1 — Creating a Kubernetes cluster

First things first, make sure you have a Kubernetes cluster up and running. If you already have one you may want to jump to the next step. If you don’t, I recommend using Kind — a tool for running Kubernetes clusters locally.
In order to create a Kubernetes cluster with Kind we will create a Kind configuration file:

You can find this king configuration file here

Once you have the tool installed and the configuration file is ready, lets create a cluster by running:

kind create cluster --name demo --config <path_to_your_kind_config>

Step 2 — Setting up an NFS infrastructure

The next thing is setting up an NFS infrastructure in our cluster, so we will be able to allocate and mount storage to our Pods. Same as in the previous step, if your cluster already has a storage provisioner you can skip this step.
Our NFS infrastructure consists of 2 components:

  • An NFS server
  • A storage provisioner

In order to create an NFS server just run the following command:

kubectl create -f https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/deploy/example/nfs-provisioner/nfs-server.yaml

Make sure your NFS server is up and running by viewing its logs:

kubectl logs $(kubectl get pods | grep nfs | awk '{print $1}')

Note! If your NFS server fails with the error — “exportfs: /exports does not support NFS export”, you will probably need to change your docker “storage-driver” setting. Docker uses OverlayFs by default, we will set it to vfs. In order to change this setting you need to edit a file called deamon.json and then restart the docker daemon.

Second, we will use a Helm chart called csi-driver-nfs as our storage provisioner, so make sure you have Helm installed too.

helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/chartshelm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --namespace kube-system --version v3.1.0

Last thing is creating a new Kubernetes StorageClass that our storage provisioner can work with:

kubectl create -f https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/deploy/example/storageclass-nfs.yaml

Step 3 — Installing the operator

At this point you should have a Kubernetes cluster with a functioning storage provisioner.

Now to the exciting part! Running the following command will install on your cluster a Custom Resource Definition called Valhalla, and a new namespace called valhalla-system with our operator in it:

kubectl apply -f https://github.com/itayankri/valhalla-operator/releases/latest/download/valhalla-operator.yaml

Now, all you have to do is create a Valhalla Custom Resource and watch the magic happen. We will achieve that by creating a YAML file that describes our resource:

You can find this sample file here

and creating it in our cluster by running:

kubectl apply -f example.yaml

That’s it! All you need to do is lay back and watch the operator do the rest of the work for you!

watch -d -n1 "kubectl get all"

Disclaimer

I hope you find my operator interesting and that it will help you scale your Valhalla deployments.
However, the operator is still in its alpha stage, so any feedback will help me understand which fixes or improvements are needed in order to make it better.

--

--