Build for the cloud with Docker

When Amazon Web Services first started popularizing cloud services, they weren’t quite the first to sell virtualized servers in place of managed hosting. However, they were among the first to sell them as an API, and moved very quickly to provide all sorts of services with API provisioning. This really changed the picture from the still-dominant “buy or rent a server” model, but software designed to take full advantage of this is still in the minority.

Do yourself a favor, build your software for the future, not the past. Make it to be easily provisioned in the cloud — and with Docker, it’s easy to make it provisionable to any cloud. This is what we did for IndoorAtlas. Doing so enabled us to deploy the same software, originally built and deployed to Microsoft Azure, to a new deployment behind the Great Firewall of China, and finally also to Amazon Web Services. Thanks to Docker and Mesos, we were able to automate what had been a manually deployed system based on spinning up virtual machines with a custom-built image and local configuration, different in every deployment, to a set of Docker images we could configure from a central directory and have Marathon orchestrate their resourcing to an optimally small infrastructure.

Docker comes with a lot of great how-to documentation and many have already extolled the virtues of using containers, so repeating that here would be of little value. In short, you get standardized deployment images containing not only your software binaries themselves, but also the operating system configured as it needs to be. And unlike for example Amazon images (AMIs), which you could also build, you get ability to run the same container image in practically any cloud, or even in your local development copy, and a lot faster boot-up and less resource overhead. These latter points make Docker great for development, where you’re going to be changing and restarting your binaries frequently.

But how does this relate to how you should be designing and building your software architecture? That’s a less discussed area, but very much connected to another large software trend, microservices.

It used to be the case that software was written to be one large monolithic block of interdependent code, leading to hard to debug, unintentional side effects of changes in one module breaking another. This happens to every successful, developing application if you’re not constantly careful about refactoring and managing those links. Good software has always been designed around APIs, both externally and internally. But when it’s deployed as one blob, it’s easy to miss where components depend on each other past their APIs. Microservices are a design intended to help this. When each component is deployed separately, other components have to use the API — there’s no other access available. Without lightweight containers like Docker, approaching system design this way was extremely expensive in both resource overhead as well as deployment time. Not so any longer.

Your architecture diagram probably contains at least a dozen of so blocks you have used to describe the logical functions of the system to someone. In larger systems, you’d have easily over a hundred such blocks. In the past, it was common to in practice deploy many of them inside one server. Don’t. Instead, make sure each of them has a well-defined API and build a separate Docker container for them. Docker containers are so lightweight, it’s easy to deploy them this way even on a local development instance. Of course, you can also deploy them via your CI system to a development instance in the cloud, whichever is your preference.

Either way, register those containers to your orchestration system (again, I can warmly recommend Marathon, but of course you could also use the orchestration tools of your cloud of choice, like Amazon ECS) and let the orchestration tool deploy them. Automatic service discovery will help them find each other, changing one does not require rebuilding and redeploying everything else, and because you now have a well-defined API with no shortcuts around it, the regression testing target is much clearer, so unintended side effects become both easier to detect and, one should hope, less frequent.