One+ year feedback of using Docker and Swarm in DEV and QA environments

TL;DR How we use docker and swarm without major issue in one year and how we configured the whole to achieve cloud and onpremise compliant scalability at developers level.

Context

At ekino. we like building great software, using latest services, tools and concepts to empower end-user experience. We mostly work on web and mobile applications (but not only). A team of 200+ passionate people.

End of November 2014, I’ve started a new project for one of our client. We basically came to build a new linux-based java API, backed with neo4j cluster, inside an existing windows-base .NET application backed with SQL Server. We agreed on an interface between the new linux stack and the legacy windows stack, so we can develop side-by-side in confined environments to successfully achieve that heterogeneous application.


Timeline

At first I used chef to provision our servers. So KitchenCI comes in the way for on-push testing of the cookbooks. The Jenkins/KitchenCI was installed on a VM so I had to use a docker driver (virtual machine inception is not a good idea). And I was pleased to see how fast and leightweight the testing process was.

I decided to extend docker usage (but not for microservice as it wasn’t the end goal of my mission). Knowing docker was reliable for it’s build-ship-deploy capabilities I started to build public images for our required engines (reverse-proxy, oracle java, neo4j cluster) and used bind mounted volumes for project-specific versioned packages (configuration, jar, database).

Now each deployment consist of 1) removing all existing packages and containers and 2) deploy the new ones. Immutable infrastructure here we come.

At this point I stopped using Chef to provision applications engines as they are now self-contained thanks to docker. I just kept it to ensure basic server state : users, pubkeys, ntp…

It’s about march 2015, swarm 0.1 is released (checkout the last 1.1 slides from core developer Victor Vieux) and I decided to take look at it. Basically at this time and version, I saw swarm as a docker proxy : it implements almost all docker server endpoints (required to forward the requests), and add some clustering features like discovery, filters or strategy.

swarm is a docker proxy
swarm is a docker server++
A whole new idea came into my mind : I could use swarm to change the way we saw environments for developers !

Rethink Environments

At OPS point of view the more environments, the more complex the work is. At DEV point of view the more environments, the more workflows can fit in : integration, acceptance, load testing, interoperability, presales demos, etc…

Docker Swarm brings the white flag. DevOps style !

Using Docker Swarm in combination with docker labels each docker server instance can be attached to an environment. Swarm brings environments at application/developers level, as Docker brought networking at application/developers level.

So I can now ask client OPS to provide me a pool of server, managed and monitored as a whole : The environment group. Creating a tiny jenkins jobs to change to docker daemon configuration, DEV are now free to (re)define environments within an environment group.

5 static Environnements
2 static environment groups + 5 dynamic environments

This setup is on-premise compliant (QA, PRESALES) and cloud ready as it will be transparent (PREPROD, PROD).


Enough theory ? Let’s take a look at the implementation now.


Implementation: Discovery

I use swarm’s file:// backend as it appeared to be the simplest way to define nodes with dynamic sizing pool while keeping the setup 100% internal.


Implementation: Filters

I use constraints to select the environment (that’s the magic trick!) and affinities to prevent containers of a cluster to run on the same node.


Implementation: Tags

Finally the jenkins job to update the docker server configurations will basically do the following

Result

DEV have now the ability to scale up/down any environment to test various configuration :

  • Adjust cluster size
  • Reduce compute nodes
  • etc..
after DEV change the environment layout

One-year feedback conclusion


Systematic Deployment

To achieve our fully automated immutable infrastructure the whole stack is deployed at once for each environment ! It means that even if you just change port number in a configuration file, you’ll end up removing all assets and containers for that environment. Yes. But a full deployment take about 3 minutes…. First steps to continuous deployment!

Reliability

Today every developers and consultant deploy multiple times a day. And no team member is afraid of to deploy anymore (Do you wish you could say the same right now ? :) ).

Yes, errors appears but it’s mostly because of container respawning too quickly (failed to create the new one as the old one is not yet known as actually deleted). Just restart the deployment job and it’s fixed (3 minutes…)

Networking and Service Discovery

As services may move around various nodes, we needed to add a consul layer for service registration and discovery.


What’s next


Docker Datacenter

Two days ago Docker announced their Docker Datacenter product line availability. It’s the combination of two products :

  • DTR : on-premise image management and storage
  • UCP : on-premise docker applications management

It requires a CS subscription but can be tested with a free 30 days trial.

With private registry, networking, clustering, high availability… this solution goes way beyond the setup I have today !

Docker Datacenter Helper

To optimize my free but limited duration trial, I have created a github project to automate the building of a Docker Datacenter (DDC) infrastructure. Feel free to fork it, improve it, try it :

Github Project Page : frntn/docker-datacenter-helper

Two last things : Ekino is is hiring ;) and you can follow us on twitter or on …twitter