Developing with Data in Kubernetes

4 min readDec 18, 2019

Two months ago, we launched the open-source Titan Project designed to help developers manage their data like code. With familiar git-like semantics for committing and checking out state, along with sharing data via push and pull, it created a new way to work with structured data on your laptop.

But as we engaged with our nascent community and users, one question we kept getting was “How does this work with Kubernetes?” While developers can and do use databases on their laptop, they are increasingly using Kubernetes for CI/CD environments, automated testing, and staging environments. How can we check out known data into those environments, and preserve data state when something goes wrong?

I’m pleased to say that with the latest release of Titan, we now have beta support for Titan-managed repositories running in Kubernetes. It will work with any Kubernetes environment that supports the snapshot APIs, and a CSI driver with support for alpha snapshot APIs (beta API support coming soon). To get started, simply create a new Kubernetes context instead of the default docker context.

$ titan context install -t kubernetes
Initializing titan infrastructure ...
Checking docker installation 100% │██████████│ 100/100 
Starting titan server docker containers 100% │██████████│ 100/100
Titan cli successfully installed, happy data versioning :)

Once configured, starting a Titan repository requires knowing only the name of the docker image you want to run:

$ titan run mongo
Creating repository mongo
Creating titan volume v0 with path /data/configdb
Creating titan volume v1 with path /data/db
Waiting for volumes to be ready
Creating mongo deployment
Waiting for deployment to be ready
Forwarding local ports

Here, we’ve started with the mongo:latest image, created persistent volume claims for each volume, instantiated a stateful set, along with a service where each exposed port is mapped to that container. To top it off, by default we forward ports to the local machine for easy access, along with easy start/stop controls:

$ mongo --quiet --eval 'db.people.insert({name: "Grace Hopper"})'
WriteResult({ "nInserted" : 1 })
$ titan stop mongo
Stopping port forwarding
Updating deployment
Waiting for deployment to stop
Stopped mongo
$ titan start mongo
Updating deployment
Waiting for deployment to be ready
Starting port forwarding

We can then treat this repository like any other titan repository, such as committing and checking out state:

$ titan commit -m "first commit" mongo
Commit 19051f18a77c4f599ebef542486b5d41
$ titan log mongo
commit 19051f18a77c4f599ebef542486b5d41
User: Eric Schrock
Email: Eric.Schrock@delphix.com
Date: 2019-12-14T20:17:48.524441Zfirst commit
$ mongo --quiet --eval 'db.people.insert({name: "Ada Lovelace"})'
WriteResult({ "nInserted" : 1 })
$ mongo --quiet --eval 'db.people.find({},{_id:0})'
{ "name" : "Grace Hopper" }
{ "name" : "Ada Lovelace" }
$ titan checkout mongo
Checkout 19051f18a77c4f599ebef542486b5d41
Stopping port forwarding
Updating deployment
Waiting for deployment to be ready
Starting port forwarding
$ mongo --quiet --eval 'db.people.find({},{_id:0})'
{ "name" : "Grace Hopper" }

Under the hood, Titan is creating a volume snapshot for each commit, new persistent volume claims from those snapshots, and patching deployments to roll over to the new state. You can also add and push to remotes:

$ titan remote add s3://my-bucket/titan mongo
$ titan push mongo
Pushing 19051f18a77c4f599ebef542486b5d41 to 'origin'
Waiting for volumes to be ready
Starting job
Creating archive for /data/configdb
Pushing archive for /data/configdb
Creating archive for /data/db
Pushing archive for /data/db
Push completed successfully
$ titan remote log mongo
Commit 19051f18a77c4f599ebef542486b5d41
User: Eric Schrock
Email: Eric.Schrock@delphix.com
Date:   2019-12-14T20:17:48.524441Zfirst commit

If this was your CI/CD infrastructure that was about to be torn down after a test, you can later clone that state to run in a local docker context:

$ titan context install -t docker
Initializing titan infrastructure ...
Checking docker installation 100% │██████████│ 100/100 
Starting titan server docker containers 100% │██████████│ 100/100
Checking if compatible ZFS is running
Checking if compatible system ZFS is available
Checking if compatible compiled ZFS is available
Checking if precompiled ZFS is available for '4.9.184-linuxkit'
Creating shared mounts
Creating storage pool
Titan cli successfully installed, happy data versioning :)
$ titan clone s3://my-bucket/titan -n docker/debug-mongo
Creating repository debug-mongo
Creating docker volume debug-mongo/v0 with path /data/configdb
Creating docker volume debug-mongo/v1 with path /data/db
Running controlled container debug-mongo
Pulling 19051f18a77c4f599ebef542486b5d41 from 'origin'
Pulling archive for /data/configdb
Extracting archive for /data/configdb
Pulling archive for /data/db
Extracting archive for /data/db
Pull completed successfully
Stopping container debug-mongo
Checkout 19051f18a77c4f599ebef542486b5d41
Starting container debug-mongo
19051f18a77c4f599ebef542486b5d41 checked out
$ docker ps | grep mongo
d71c1e2f381d        mongo               "docker-entrypoint.s…"   19 seconds ago       Up 2 seconds        0.0.0.0:27017->27017/tcp   debug-mongo
$ mongo --quiet --eval 'db.people.find({},{_id:0})'
{ "name" : "Grace Hopper" }

Kubernetes is still in a beta state, with known limitations that will be addressed in a future release. But we’re going to need the community to help us figure out some of the deeper questions around the operational model and use cases. Do you want the Titan metadata to reside within the cluster instead of on your latptop? Do you want Titan to work with helm charts, native operators, or other orchestration tools? Head on over to the Titan Community to join the discussion and contribute feedback, ideas, or code!

Happy data versioning!

Developing with Data in Kubernetes

Written by Eric Schrock