Has this ever happened to you: it’s late Friday afternoon, and you make a little change to your backend codebase. Maybe just a one liner, just to fix a typo. You take that change and push it through to development, staging, and finally production. But alas, everything breaks and you don’t know why.
This should never happen (seriously). Rules and processes exist for this exact reason. Let’s step through another, more realistic version of this scenario:
- You realize you need to make a minor fix.
- You VPN into your development environment, make your change, and run some basic tests.
- The tests pass, and you commit your changes in version control.
- You then deploy to development and staging environments, and the QA folks give the green light for a production deployment.
- After deploying to production, everything breaks and you don’t know why.
So what went wrong? Maybe your development environment didn’t match your production environment (version mismatching, incorrect/stale data, etc). Maybe your tests weren’t sufficient. But either way, something was clearly different between your development and production environments.
This is where the importance of accurate testing comes into play. Your tests are only as good as the environment and data you’re running against. If there are any discrepancies between your testing and production environments, there is room for error.
The question is, how do you improve automated testing and speed up development of your backend services, while taking into account the following:
- Security: Avoid having developers connect into centralized data stores, regardless of the environment.
- Ease of development & testing: Don’t introduce roadblocks or issues that make your engineering teams less efficient. Your goal is to make them more efficient.
- Separation of environments: Don’t test or develop directly against production systems.
- Movement of large amounts of data: It’s not uncommon for production databases to contain terabytes of data, making it impractical to simply copy all of this data directly.
FinTech Studios’ solution was to mock our entire backend infrastructure with lightweight Docker images.
We create automated snapshots of our entire backend infrastructure, using live data. This allows us to use real, representative data for testing and development of our REST API and other backend services. In developing this system, we had the following goals:
- Create a perfect snapshot of our entire backend infrastructure, including precise version matching.
- Use real (potentially obfuscated) data.
- Have enough data to be useful, but not so much that the resulting images become unwieldy. We aimed to keep the entire stack under 5GB when compressed.
- Make updates simple and semi-automated.
- Integrate with CI/CD.
To meet these requirements, we utilized Docker and Docker Compose. We created individual Docker images for each service, and then orchestrated the services using Docker Compose. All developers (and CI/CD systems) need to do is authenticate with our container registry, then run a quick
docker-compose up. That’s it!
After our initial implementation, we found one blaring issue: if we used the Docker
COPY function to build database dumps into our images, restoration would then be performed each time the image was run. Restoration alone could take several minutes for each image: certainly not an option for CI/CD integration!
Thankfully, Docker 17.05 and higher provides a new option: multi-stage builds. This feature allows us to chain together multiple images, using results and resources from previous stages. This is very important, because we can separate the process of database restoration out of our deployed image. As a result, data restoration is not performed each time the image is started. Let’s use Postgres to illustrate this process.
We use two stages: the first stage will restore the raw database dump. We then copy the resulting Postgres data directory into the second stage:
FROM postgres:9.6.1-alpine AS donor
COPY “my-data/*” “/tmp/”
COPY “dump-restore/*” “/docker-entrypoint-initdb.d/”
RUN /docker-entrypoint.sh --helpFROM postgres:9.6.1-alpine
COPY --chown=postgres:postgres --from=donor /pgdata /pgdata
Let’s go through this line-by-line, starting with the first stage:
- Start with a basic Postgres image (labeled as
- Set the
PGDATAenvironment variable. This tells Postgres which directory to store its database files in.
- Copy the database dumps from the local
my-datafolder into the image’s
- Copy the database restore scripts from the local
dump-restoreto the image’s
/docker-entrypoint-initdb.d/folder. We took the approach of using several shell scripts to do this, but Postgres does offer the option to automatically use SQL dumps in the
- Then call the
docker-entrypoint.shscript to get things started.
If you were to deploy this image, every time the image is run, you would have to wait for Postgres to restore the data we placed in
/tmp. This doesn’t solve the restoration-wait issue.
A second stage takes care of this issue:
- Start with the same image as the first stage:
- Set the
PGDATAenvironment variable again.
- Finally and most importantly, use a
COPYcommand to copy the first stage’s data directory into the second stage. This is the magic line: when the image is built, the first stage does the heavy lifting of performing the database restore. When the second stage is deployed, it already has the restored database. Think of it as a time-memory tradeoff: the final image will be slightly larger than the first (because of indexes), but will take vastly less time to start. Perfect!
Although we showed how to do this with Postgres, the concept remains the same across whatever database services you need. Simply use the first stage to perform the data restoration, and then copy the restored database directly into the second stage.