Migrating a monolithic / legacy app and DB to Docker and Kubernetes

Step-by-step retrospective and relevant code allowing to replicate this method to migrate legacy / monolithic apps to Docker and Kubernetes.

All the code referred to in this guide is open source in the following repositories:

Introduction

Midburn is a regional Burning Man event in Israel, it was the 2nd largest regional event in 2017 with 11,000 attendees.

We also have an active tech team which provides services to help manage this event, things like ticketing, camps and volunteers management and all other tech requirements.

The monolithic app

Among the apps Midburn tech maintains is a a Drupal based environment which handles many core services which most other services depend on.

This drupal environment is monolithic in several ways:

  • It’s all installed manually on a single m4.xlarge AWS EC2 instance
  • The code is only available on the server and there is no clear distinction between the code, configuration and data.
  • The DB is highly dependant on the code / configuration — for example, an installation of a Drupal module makes changes to the DB and configuration.
  • It contains a lot of historical and unrelated data, hard to differentiate from the relevant data.
  • Making changes requires specific knowledge few people posses.

This monolithic architecture is hard to maintain in the long-term because it makes it very hard and risky to make changes.

The solution

Docker and Kubernetes are great tools that enable to provide a quick solution for this monolithic app which solves many of the problems.

You can see the end result -

Download the server data

Poking around in the server reveals 3 directories where the code / configurations are stored. I used google cloud storage to dump all the relevant data from the server.

It’s easy to install the gcloud tools on the server using either the yum or apt-get methods. Once you have gcloud tools installed you can login and copy the data to the cloud

$ ssh -i secret.pem ubuntu@1.2.3.4 
:~$ gcloud auth login 
:~$ gcloud config set project GOOGLE_PROJECT_ID
:~$ DATE=$(date +%Y-%m-%d)
:~$ gsutil -qm cp -cUPR /etc gs://bucket-name/profiles-etc-dump-${DATE}/
  • -q: quiet, otherwise it outputs every copied file
  • -m: allow to copy in parallel
  • -c: continue in case of errors
  • -U: skip unhandled file types
  • -P: preserve posix attributes
  • -R: recursive

One more thing we need from this server is to figure out which distribution it is and which system dependencies are needed. We will replicate them in our Dockerfile.

Create the server container

In this case it was easy as they used a distribution that installed all requirements locally so we just need figure out the distribution, in this case Ubuntu Trusty.

You can see the full Dockerfile here

FROM ubuntu:trusty

Install gcloud and copy the downloaded data

RUN apt-get update && apt-get install -y curl &&\
export CLOUD_SDK_REPO=”cloud-sdk-$(lsb_release -c -s)” &&\
echo “deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main” | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list &&\
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add — &&\
apt-get update && apt-get install -y google-cloud-sdk
COPY secret-k8s-ops.json .
RUN gcloud auth activate-service-account — key-file=secret-k8s-ops.json;\
gcloud config set project midbarrn;\
addgroup — system — gid 1000 bitnami;\
adduser — system — uid 1000 — gid 1000 bitnami;\
mkdir -p /opt/bitnami;\
gsutil -qm cp -cUPR gs://midburn-k8s-backups/profiles-production-2018–01–16/etc /;\
gsutil -qm cp -cUPR gs://midburn-k8s-backups/profiles-production-2018–01–16/home/bitnami /home/;\
gsutil -qm cp -cUPR gs://midburn-k8s-backups/profiles-production-2018–01–16/opt/bitnami /opt/;\
rm -f secret-k8s-ops.json

The files are private, so the docker build copies the secret google service account key file to download the files, then deletes it. It also sets correct permissions because we want to keep file ownership and permissions.

I use google cloud builder and repository services to build and store the image privately.

gcloud config project set PROJECT_ID
gcloud container builds submit -t gcr.io/PROJECT_ID/REPO_NAME:TAG .

When done, pull the image

gcloud docker ‘ — ‘ pull gcr.io/PROJECT_ID/REPO_NAME:TAG

Check the repository for a more advanced cloudbuild script and configuration

Download the database dump

mysqldump “ — host=db” “ — port=3306” “ — protocol=tcp” “ — user=root” “ — password=${MYSQL_ROOT_PASSWORD}” “ — all-databases“ > “dump.sql”

Copy the dump to google storage

gsutil cp dump.sql gs://bucket-name/db-dump-$(date +%Y-%m-%d)/

Create the database container

Download the database dump

gsutil cp gs://your-bucket/dump.sql ./dump.sql

Run a DB server, mounting the dump file and the DB data directory locally

docker run -d “ — name=mysql” \
 -e MYSQL_ROOT_PASSWORD=123456 \
-v $(pwd)/dump.sql:/dump.sql \
-v $(pwd)/db:/var/lib/mysql \
mysql:5.6

Import the DB dump data

docker exec -it mysql bash
echo “create database midburndb;” | mysql — host=127.0.0.1 — user=root — password=123456
cat ./dump.sql | mysql — host=127.0.0.1 — user=root — password=123456 — one-database midburndb

When mysql dump is done, copy the data directory to google storage

gsutil -qm cp -PR ./db/ gs://bucket-name/db-data-dir/

Now you can create the Dockerfile similar to the server Dockerfile which copies the data directory from google storage. You can see the full DB Dockerfile here. It’s important to remember to copy the data to a directory which is not a volume, otherwise it will be empty or overwritten when the container starts.

Run the environment

I use docker compose to setup a local development / testing environment. You can see the full docker-compose.yml here

Because the images contain private data, they are stored in the private google repository. You should pull them first using authenticated gcloud CLI

gcloud docker — pull gcr.io/GOOGLE_PROJECT_ID/IMAGE_NAME:TAG

start the environment

docker-compose up

Making modifications and debugging locally

When debugging locally, if you want to rebuild the image without re-downloading all the data dumps, you can comment out the download lines and just base the image on another image which has the data dumps. This is the in the Dockerfile.

When configured this way, to rebuild and re-run the local environment, just run:

docker-compose up “—build”

You can see in the Dockerfile some of the modifications I made —

  • Modify configuration files — you can copy the original files from the data dumps or from the docker image. Then overwrite the files in the Dockerfile.
  • Added a docker entrypoint (see here) which starts the required services, keeps the container running and outputs logs.

When making these modifications the build completes locally in a few seconds (once you have the source images) allowing for very fast development and testing cycles.

The DB Image could be developed in the same way, in this case it was a simple Mysql DB, so not much else was needed.

Deployment to Kubernetes

You can see all the Kubernetes Helm templates here

Some notable features of these templates:

  • The drupal pod uses a configmap to override a configuration file.
  • The DB pod allows to optionally provide an sql dump file to import from, instead of running the pre-populated image.

Continuous Deployment

I use some templates and scripts I collect in sk8s, which also depend on the main Midburn Kubernetes environment at midburn-k8s.

Check out the drupal app travis.yml and continuous_deployment.sh scripts.