Deploying OpenStreetMaps services through Kubernetes

Getting something out of your small, local laboratory to face the outside world is both exciting and scary. It’s scary since you don’t know what will exactly happen, similarly, exciting for the same reason.

Unfortunately launching a service which is a tad more complicated than a static website can also be daunting. Granted, it’s become easier with cloud platforms such as Google Cloud and their extremely generous trial program (run $300 of VMs, SQL databases and more, at no cost). No longer do I have to beg my ISP for a static IP address and fiddle with my router to make sure port forwarding is configured correctly.

Hop on board the Google Cloud with a free trial

Having written my OpenStreetMaps-based train tracking service based on Docker services, naturally I wanted to deploy these containers with the same technology. Deploying directly with Docker is still in Beta, but a different service has been rocking it since quite some time: Kubernetes, in the Google Container Engine.

Introducing the architecture

Deployment strategies are closely tied to the chosen architecture. If you develop a SaaS application, not opting for a multitenant architecture but for something custom is rather surprising. On the other hand, when piecing something together out of many existing pieces of open source software, being able to pick and mix programming languages, operating systems and databases is a blessing.

Rough architecture model, outlining external dependencies and internal system composition

The system is composed of Postgis for storing route data and relations, plus an ElasticSearch instance for querying train stations. The OpenStreetMaps data behind this is periodically imported from GeoFabrik in a fashion originally demoed by OpenRailwayMap. Finally, MapBox is leveraged for their excellent tile processing capabilities using their upload API.

Learning Docker & co.

My learning path with Docker is far from complete. I’d say I’m taking in new pieces of information as I go. At first, merely being able to launch a container and execute some commands against it felt great and was sufficient. For Postgis and ElasticSearch service persistence learning more about Docker volumes was essential. Turns out it’s not so bad, if you think of it as a remotely connected hard drive which you can attach & detach at wish. Giving multiple services access to the same volume still gives me the creeps, so I strayed away from going in that direction.

Having become reasonable apt at Docker, running Kubernetes requires additional learning steps. Being familiar with Containers, Services and Volumes is essential for this project. Interestingly, the best guide I have found so far is The Children’s Illustrated Guide to Kubernetes.

Creating the containers

Kubernetes comes with an excellent CLI overview, a quick-start guide for Docker users and a cheat sheet. I used all of them eagerly to figure out how to get services running. My most used command option surely was -o yaml, to output the to be ran command to a file for inspection and storage. In fact, my development practice was running these commands, followed by kubectl create -f <filename> to run a captured instruction.

After having tagged my API image I pushed this to Google’s Container Registry, and, without additional persistence required, was able to relatively simple publish this by combining a deployment with a service, automatically redundant merely by configuring the amount of replicas to run this deployment on.

Getting a container with an attached volume and therefor more persistence capabilities up and running required a bit more magicking. Luckily this way was already paved: I stumbled an excellent article about how to run Postgres in Google Container Engine. It seemed to imply that it would be best to create a separate disk beforehand, mount it to an newly launched VM, format it with ext4 and only then run the creation commands.

Order does matter: the persistence command should be executed first, then the claim, because without those, the deployment wouldn’t know where to store its data. Lastly, the service exposes the database to the outside world.

I used the same approach for my ElasticSearch instance, where only some required environment variables were different.

Consuming the API with a demo Javascript front-end over Mapbox

Next steps

A better approach would be to use internal networking, placing all mentioned containers in one pod, but I felt at least somewhat proud of my accomplishments of having pushed these services live. Next up: an improved service with station-to-station routing, and updating the deployment.

Did I miss any great articles on onboarding for Kubernetes? Did I over-complicate matters or did I take too many shortcuts? Other glaring oversights? Any feedback is welcome.