Titan: Better Development with Data

Eric Schrock
Sep 27 · 4 min read

I’ve spent the better part of the last decade working with teams and organizations to better leverage data in their software development lifecycle. Through that experience, I’ve seen firsthand how teams move faster and deliver better software when they have easy access to high-quality data. At Delphix, we’ve been focused on solving this for complex enterprise databases, but the problem is much broader.

As DevOps tools and practices have matured, Developers are increasingly doing development and testing entirely on their laptop. Even with ready access to cloud resources, there’s nothing like the quick iteration cycle on your laptop, and being able to take that experience on a plane without having to worry about network connectivity.

This is all well and good, but many applications require some kind of persistent data store to be useful. No number of mocks can replace the value of working with a real datastore — if you can swing it.

The rise of docker has provided a ubiquitous means to deliver complex software like databases, with thousands of off-the-shelf images at your disposal. Even cloud data stores like DynamoDB are runnable in a container. And yet the state of the art for managing persistent containers has barely budged since docker’s inception. Efforts like flocker and dotmesh have faded or pivoted, and while a new wave of efforts is forming around the Kubernetes Container Storage Interface, they’re largely focused on managing data in production and CI/CD clusters, not on developers working on their laptop.


This led us to ask our customers, friends, and colleagues a simple question: “Would it be useful if you could manage data like code on your laptop?” When the answer was a resounding yes, we set off to build a new solution that would combine the simplicity of docker, the familiarity of git, and the power of versioned data. We quickly realized this was not just a Delphix product, but something that benefits (and should be built together with) a broad developer community. After six months of incubation, I’m pleased to share Titan with the world.

Titan is an open source project for developers to manage their data like code. Titan makes it easy to run your favorite database in a docker container on your laptop, but with the power of versioning the underlying data. Titan’s git-like CLI enables developers to clone, commit, checkout, push, and pull data just like code, making it easy to rollback to a previous state, build a test data library, or share a structured dataset with collaborators.

With Titan, you can easily clone a database, complete with schema and real data, right on your laptop:

$ titan clone s3://titan-data-demo/hello-world/dynamodb hello-world
$ aws dynamodb scan — endpoint http://localhost:8000 \
--table-name messages | jq -r ‘.Items[0].message.S’
Hello, World!

Want to run some tests while being able to repeatedly reset back to a known good state? Create a commit and check it out at your convenience:

$ titan commit -m “starting state” mongo
b040cfe3-aae5–42b2-a41c-6fe2e2baad1c
$ … run tests …
$ titan checkout -c b040cfe3-aae5–42b2-a41c-6fe2e2baad1c mongo

Run into an issue and want to share the state with someone else to help debug? Push it to a remote repository where they can clone or pull it down locally:

$ titan remote add ssh://user@host/data postgres
$ titan commit -m “somthing’s wrong” postgres
93b10f55-a3e2–48af-b3b7–9f573d4b10d0
$ titan push postgresFƒ

All of this works with any docker container with volumes, no special connectors or plugins required.


Titan has the potential to be a game-changer. As Robert Reeves, CTO of Datical, put it:

Setting up and tearing down databases for developers has been the bane of the dev workflow. Not only do developers have to decide WHERE and HOW to run the database but they have to struggle with the configuration.

Of course, containers are perfect for local development, but until Titan, applying the dev workflow to the data just didn’t happen. Now, we can apply the same dev workflow we have for code (clone, branch, commit, etc.) to the data. Seriously, it doesn’t get easier than “titan clone s3://titan-data-demo/hello-world/postgres hello-world” to have a fully functional data-populated Postgres instance on your machine.

At Delphix we’re thrilled to be a part of bringing this community to life and are eager to invest in Titan as well as our offerings around it. Titan is an open-source community at its core and welcomes anyone that wants to be a part. We’re still a young project, with plenty of rough edges and visions for the future. But we’re excited for you to join us. So head on over to titan-data.io to give it a whirl, and join the community to share your experiences and help improve Titan for developers around the world!

Eric Schrock

Written by

Chief Technology Officer @delphix. Father, creator of things, and self-proclaimed data geek. He/him.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade