Database Migration When Your Service Is Running in Kubernetes

Jonas TM
CodeX
Published in
4 min readSep 4, 2022

--

Database migration is a topic every developer has to face, as soon as a service needs a permanent persistence layer. For relational databases, schemas are needed and often evolve over time, as the service that requires persistence gets changed. Because of this, one will be challenged with a database migration.

The most common approach to this is using a migration tool (or do it by scratch) which will create an additional table that contains the database version. When running a migration, normally some type of database lock is applied to avoid parallel execution of the migration.

Running services in a Kubernetes context can make a database migration tricky.

Let's have a look at a possible solution to where and when to run the migration.

Option 1: During Service Startup

A lot of migration tools are available as a library that can be used in the code of your favorite language.

The easiest approach is to run the database migration during the startup phase of your service.

The advantage is that the setup is easy, and no additional configuration is needed. However, the problem is that as soon as you are running more than one pod of your service, you can run into difficulties. One pod will try to start and run a migration but fail due to another pod already running it. This will lead to the service not starting up and therefore will throw error events in the cluster and force multiple restarts until the migration is finished by the other pod. Another problem might be a big migration that exceeds the set start-up time limit and causes the Kubernetes cluster to think your service is not able to start.

Option 2: Use initContainers to Run a Migration

Kubernetes offers a functionality called initContainers that are available for deployments. InitContainers are executed once before the start of each pod (when doing a deployment). A pod of the deployment will not be started until an initContainer finished successfully. If an initContainer does not complete successfully, it will be restarted until it does.

If we run the database migration in an initContainer, it will be executed separately before the service startup. This has the advantage of separating service from the database migration logic. However, since one pod gets one initContainer we run into the same problem as with Option 1. As soon as we deploy multiple pods at the same time, only one can run the migration process at a time. The others will fail and restart multiple times.

Option 3: Run the Migration in a CD Pipeline Before Deployment Is Triggered

Most developers nowadays use some kind CI/CD pipeline to build their services and deploy them to a Kubernetes cluster. With this in mind, it is also possible to run the migration tool as a step in a continuous deployment pipeline before the Kubernetes deployment.

This is a viable solution, however, it might not be possible due to a database being only accessible from inside the Kubernetes cluster. This also means that the CD pipeline needs to have access to the database secrets, therefore they might have to be saved in two locations instead of one central one (depending on the CI/CD solution).

Option 4: Run the Database Migration as a Kubernetes Job

In Option 2 we covered initContainers as a solution to run a migration before pod startup. Instead of using initContainers one can also use a Kubernetes Job for the same purpose. However, this comes with a bit more complexity, since Kubernetes provides no native functionality to wait for a Job to be executed before starting pods.

To overcome this, one can use a service as initContainer that constantly checks if the migration Job has successfully finished. Only after the Job is finished, the initContainer will complete its execution successfully and enable the pods to start.

Attention: For this use case, the running initContainers need access rights to read the state of the migration Job.

A tutorial to set up initContainers together with a k8 Job can be found in this article:

If you are using technology on top of Kubernetes there might be simpler solutions e.g. Helm, where the same can be achieved with Helm Hooks completely without initContainers.

Other Mentions

There are some other options that might be worth a look but are not commonly used:

  • SchemaHero: This enables you to define Database Schema as Kubernetes configurations in YAML. Takes care of migrations for you when you change the configuration. Currently, does not offer data migrations (only schema).
  • Migration-Operator: Kubernetes Operator that does the same thing as Option 4 but in a more automated easy configurable way. No enterprise support.

Conclusion:

In general, I would not recommend using Option 1 or 2 if your application is running with multiple instances (pods). In these cases, Options 3 and 4 should be preferred to avoid errors due to parallel or long-running migrations. As always, the solution has to fit your use case, which might have other pre-conditions and requirements.

Do you know another option? Share it in the comment section!

--

--

Jonas TM
CodeX

Software Consultant mainly into Kotlin, Java, Go and some Cloud things.