The key to great database Continuous Integration is disposable databases
Database Continuous Integration (CI) is often the final piece of the CI story for a project and with good reason; it is hard to do and much harder to have something your team will describe as ‘great’.
Having worked with numerous companies over the years to implement Redgate’s Compliant Database DevOps solution, I have seen many of the challenges that teams face with database changes and the trade offs between moving fast and maintaining quality.
Database bottlenecks
There is no way to escape database bottlenecks. Even with a perfect database development process, database CI and automated database deployments. Your production target is something that you can’t deploy to in parallel, which means that changes need to be queued or batched. Eliminating all the bottlenecks preceding production is the goal and if you can do that, while maintaining confidence in the changes being made to production then you are truly among the elite.
The cause of the vast majority of database bottlenecks is extremely obvious; data. Having data (including test data) anywhere outside of production brings with it questions that need to be answered:
- Where will it be stored?
- How do I transfer it?
- How long will it take?
- How is it maintained?
- Who needs access?
- Do I need to obfuscate it?
The answers to these questions often lead to two scenarios:
- Shared environments
- Developing against no or limited data
While developing against no or limited data might not seem like a bottleneck, it creates bottlenecks later in most cases. You are either testing those changes on a shared environment down the line or deploying changes to production without the confidence required to maintain quality.
Disposable databases
A database that you can easily create on demand, use and delete afterwards sounds perfect and is relatively easy today with containers. At some point however those same questions above need answering as without data, the experience for your team is diminished and ultimately quality and/or speed of your changes is the sacrifice.
Looking at database CI in more detail, a disposable database is crucial to avoid it becoming a bottleneck. CI should be able to run in parallel while remaining isolated from any other runs with the only bottleneck being the number of available build agents! For this reason, the vast majority of database CI I have seen relies on no or very limited data. The database is created on an existing database instance available to the build agent, an instance on the build agent itself or a container and then the schema created from scripts in source control, including the new changes. Sometimes tests are run with some small sample data and then the database or container is deleted. This is a great thing to do and a great start, but it isn’t ‘great’ database CI. We’ve got confidence that things like syntax are correct and possibly ordering of scripts but what will happen when we deploy to production and data is involved? Typically another step is then used to deploy to an existing shared testing database with data, but this is then a bottleneck with the added complexity of when to run this step? Deploying to this database before merge inevitably leads to this environment being corrupted for everyone, but after merge feels too late in the process.
Going back to the title of this story, it really should read:
“The key to great database Continuous Integration is disposable databases, with your data”
Being able to quickly provision a database, with production or production like data and tear it down after use really is the key to great database CI. With every commit or pull request you can test the impact of changes on a database with data representative of production, gaining confidence that quality is being maintained. When this can be done in parallel, each test isolated from every other, it allows you to pair quality with speed.
Fortunately there are technologies available that make this possible, Spawn is in preview and you can sign up today for free and unlock your database CI pipelines.