At Qlector, we are committed to developing and delivering high quality software and we take into account the best engineering practices as listed in 12 factor apps.
In the following post, we describe how we introduced Continuous Delivery. By doing so, we’ve reduced waste and the costs of development, as well as increased quality. First we will briefly describe the practice and principles, and later dive into details about how we implemented it. Throughout the post we point to relevant articles which were useful to us when thinking about problems and designing a pipeline.
Definitions, principles and metrics
CI & CD2: how do we define them?
The practices are related and mean higher stages of build automation. Continuous Integration is the practice of building and executing tests on each commit on an independent server. This server should guarantee a clean environment where code is built from scratch in order to ensure no failures or tests pass due to some inconsistent state where they’re run. We may also need to build in different environments, to ensure the same code does not behave in different ways depending on the environment.
Continuous Delivery takes this further: for each commit we build a binary, which can be deployed to production with the push of a button — making this decision a business one. Continuous Deployment means we deploy the binary we get from every commit — no longer requiring human intervention.
Not every software project can introduce continuous deployment to a PROD environment. But introducing it into other stages greatly helps us better understand deployments and makes them less painful, shortens the feedback loop, and avoids investing time in tasks that can be automated. By doing this frequently in a no-risk environment, we put constant stress on the delivery and deployment process, thus reducing its fragility and turning it into a low-risk activity in production.
Continuous Delivery is based on 5 principles:
- build quality in: the later we find defects, the higher the cost. We aim for a short feedback cycle in order to find and fix issues the quickest possible way
- work in small batches: allows us to have quicker feedback and less surface to explore in case of issues, thus allowing us to come with fixes shortly after issues are detected
- computers perform repetitive tasks, people solve problems: people should invest their time into creative activities and automate where possible in order to have time to do so. This not only increases process quality (less human error over time), but also greatly motivates people, so they can devote themselves to meaningful work
- relentlessly pursue continuous improvement: continuous delivery is not a one time shot — is an attitude: we can always improve the process and by doing so, will increase overall quality and productivity
- everyone is responsible: the goal is to build a better product and that requires teamwork. Better quality and delivery is not a matter of isolated teams, but transcends to each one in the company.
How to measure it
In every company, coupled with every initiative we require a success metric — some KPI that would allow us to understand how we perform. How do we measure continuous delivery?
The State of DevOps report and Accelerate book highlight that in order to predict and improve the performance of a team, we only need to measure four key metrics:
- lead time,
- deployment frequency,
- mean time to restore (MTTR), and
- change fail percentage.
These metrics were also highlighted as worth pursuing by the latest ThoughtWorks Technology Radar.
Designing the pipeline
When we committed ourselves to building a continuous delivery pipeline, we sought principles to design and build it on a good foundation. These principles are:
- to build packages once,
- deploy the same way everywhere,
- smoke test deployments, and
- keep environments similar.
Enforcing immutability has many benefits and thus we paid attention to it across different stages:
- we use Docker to provision build slaves on demand: these are created for a single build and destroyed when finished, ensuring a clean environment. To make sure that the right slaves are created, we label slave types and associate them to certain Docker images and this way control the environment on which builds run.
- to make sure binaries are built once, are accessible and remain immutable, we use a binaries repository management tool. Our choice was JFrog Artifactory.
- from binaries immutability to the environment where the application is running, there is a long way in between. We use Docker to achieve it: for every build we retrieve the latest binary, install it on a Docker image, and release it. This way we can start any version anywhere, making it easy to deploy the latest, rollback if an issue is found, or reproduce a specific version with the same environment.
- Jenkins holds a lot of information. Over the last year we saw a great new feature developed, that allows us to specify Jenkins Configuration as Code (JCasC). We gave it a try and coded all our plugins and pipelines using this feature. Although it is not mature yet, we find it very promising and agree is the way to go.
What does our pipeline look like?
Whenever a commit is pushed to Github, Jenkins will start a build for that project. A new slave will be created with the right environment for it, code will be retrieved from the repository using Github Deploy Keys, and it will be built and tests run.
If all tests pass, it will proceed to create a binary and push it to JFrog Artifactory and have our slave killed.
Another slave will then pick the latest build from our binaries repository manager and install to a predefined Docker image, releasing a new version to Docker Hub. Every new image will be tagged with two tags: the corresponding version and ‘latest’, making sure we can always retrieve the latest version without knowledge of its version.
Finally, we issue the corresponding notifications on success or failure. Here we implemented different policies regarding how to notify build status. We usually notify only when status changes, to avoid notification overload (and the consequent decrease of attention from receivers), notifications are issued to a Slack channel everyone is subscribed to.
As a standard practice, important stats on time per pipeline stage and their status are displayed by the Jenkins Build Pipeline and current pipelines status can be visualized on our Jenkins Build Monitor.
War stories on building the pipeline
Things are easier said than done :) But we all proudly remember hard times, which allow us to display resourcefulness and tenacity towards solving pressing issues we face. Building the pipeline was no exception and we would like to share our experience on some issues we faced when developing it.
Slaves provisioning with Docker
We decided to communicate over ssh and customized the Jenkins Docker ssh slave template for that. Best practice is to not only have a unique instance per build, but also to issue a new pair of keys to login into each of the slaves, so that no one can log into them except the master. We achieved this by requesting ssh-key injection when configuring Docker agent templates.
Docker in Docker (DinD) configuration
We explored using a DinD configuration, but finally decided to run Jenkins master on a dedicated server and only dockerize its slaves (except those building Docker images). We found a good summary of potential issues and links to further relevant resources in this post.
Our final architecture runs code building and tests up to binary publication to the binaries repository management inside Docker slaves. Then retrieves the binary on a non-dockerized slave and proceeds to build and publish the Docker image.
Smoke testing the image
After building our Docker images, we make sure the app starts with associated services as expected. Only after these checks pass do we push it to the Docker registry, ensuring a new level of quality. Images we build are removed from local slave, to avoid accumulating unnecessary waste on disk.
Build notifications: implementation and policies
Notifications are a central feature to the pipeline: proper message and configuration may well determine its success. Shortening the feedback loop means properly communicating what the pipeline was built for: if changes we introduce seem good or something needs to be fixed.
We decided to notify builds status through Slack. In our experience, a notification per each build may result in many irrelevant statuses that decrease developers’ attention due to notification overload. An alternative policy would be notify only about bad builds or on status change — policies we implemented.
We are grateful to Betterment for sharing their experience on this topic and decided to share a code snippet with policies and message formatting as well. Bonus? We provide a variety of emojis for each build status, so that messages are not always the same :)
Building context into Docker image profiles
Among continuous delivery pipeline patterns we find the build packages once pattern: by deploying same code we tested, we eliminate packages as the source of failure. Since we may need to deploy containers from same image in different contexts and each one with its own configurations, we built in support for profiles. Depending on them we are able to load different configurations, ensuring the same build may behave as configured in different contexts.
How do we trace changes across the whole pipeline? Versioning may help us …
How we handle versioning
An important issue regarding software development is versioning, which is meant for the purpose of communicating changes in binaries we deliver. Versioning can be a great challenge since it requires us to convey meaningful information to humans, but we should be able to delegate its creation to machines. How did we address this issue at Qlector?
Versioning can be addressed broadly, to convey information about
- the magnitude of the changes
- if the build is a final release or meant as WIP
- provide traceability to latest commit, so that we can quickly bind a binary to its corresponding status in the codebase.
By providing a deterministic heuristic, we ensure the same version will be produced under the same circumstances.
Magnitude of changes
When speaking about the magnitude of change, we mean if changes we introduced to the new version break compatibility with the previous API, add new features, or/and fix reported issues. This information is best conveyed following the SemVer convention, which, due to its clarity, has become a standard. SemVer also supports adding pre-release and commit tags — features we used to convey additional meaningful information.
Definitive vs WIP releases
It would be nice to be able to know, just by reading a version, if it corresponds to a definitive release (milestone) or to some work in progress. For this purpose we borrowed the concept of SNAPSHOT release from Maven. In addition to the SemVer number, we add the SNAPSHOT tag to WIP releases.
Tracing the build back to a git commit
Similar to the SNAPSHOT tag, we may also add a short git commit tag. This way we can trace each build version back to the commit that generated it. By doing so, we have a reference to triage issues in the codebase when reported from a deployed environment.
When developing a pipeline, we need to provide some mechanism to automate versioning — something that would consider the rules above and provide a proper version for next release. To do so, we created a script that proposes a version based on latest git tag version hash and latest commit hash:
- if the latest commit is the same as the tagged one, we return the tag as version’
- if the latest commit is not the same as the latest tagged, we increase by one the minor value from the version retrieved from the git tag
- add the SHAPSHOT tag
- in all cases, we add the short version of current commit tag to provide traceability between build version and code versioning system.
Faster is better: speeding up the feedback loop
At Qlector we greatly value agility: the shorter the feedback loop, the quicker we develop features, identify issues, and remediate them.
After setting up our continuous delivery pipeline, we saw two opportunities for speeding up:
- perform continuous deployment on non-risk environments and learn best practices to make it painless in production
- optimize our dev environment, so that local versions of services are created and means are provided to gain feedback on new code
Staying current with dev dependencies
Staying current with dependency versions is important due to security reasons (avoid vulnerabilities) and to minimize the upgrade gap (the greater the delay, the greater the gap of changes we need to adapt in our code — incremental changes avoid big refactorings). Despite this, not all teams and projects regularly perform updates. Research performed on this topic found out that some practices may help teams to update dependencies regularly.
In our case we decided not to implement an automated PR for version changes, but have regular job test updates that would notify us if our code builds without issues, and we do so regularly.
Towards an agile dev environment
Even with a continuous delivery or deployment pipeline, we spend most time coding on a local environment. Anything we can do to speed up this process will likely have a great impact on productivity.
One of the principles for continuous delivery is to have the least possible difference between environments. This stands for development as well. Having this in mind, we decided to create an image identical to the one to be deployed to production, that would mount the code from the repository, install required dependencies, and watch for code changes. On code change, it recompiles the sources, runs tests, and provides immediate feedback if something is not behaving as expected.
Once we migrated to this schema, it enabled us also reduce time to setup environment for newcomers: by issuing two command they had everything up and running. It also helped us to ensure things work as expected in production and that nothing works or breaks in local environment due to code or configuration leftovers.
We found many opportunities for improvement, enforcing standard practices such as continuous compilation and testing and working towards a lean setup. This way we significantly reduced setup times and got quicker feedback on new code developed.
The extra mile: dockerized development environments!
Continuous delivery follows some principles inherent to a lean approach, such as continuous improvement by removing all kind of waste. In our case we did this by automating repetitive tasks and focusing only on tasks that drive value — those that require human creativity and skill. Can we build a dev environment to achieve this?
By following this lean principle, we identified features such a platform should satisfy:
- the best scenario we can aim for would be a zero knowledge setup for anyone working on the project: just by cloning the code, opening an editor or IDE and executing some script we should have everything ready to develop, view the app, and get feedback. This should not take longer than a minute or two :)
- the environment should not impose constraints to the developers: everyone should be able to work on the OS and IDE of their choice, removing unnecessary learning curves
- we should replicate the same environment as in production to make sure no other issues arise outside those that may happen in production in the same conditions. By doing so, we standardize running OS, dependencies (at OS and application packages), components location, encodings, as well as users and permissions
- we should prevent issues due to stale conditions on the developer side such as stale packages, configurations or files. Sometimes this makes things work locally but they are impossible to replicate on other machines
- make it configurable: the developer may decide if all or some modules and services should be running and how (as usual, by having changes being watched and recompile, lint and / or test code on changes?). When starting the environment, shall we start from where we left or, for example, recreate the database? Shall we recreate the container?
- provide configurations, certificates and credentials defaults, so that the developer does not need to worry about them until some specific change is required
- provide tools and means to ease debugging
We achieved this by using a Docker images hierarchy that provides the same environment for development, CI, as well as production. It enables us to start all services as defined in Docker compose.
In development we mirror code from the git repository into the container, running the modules inside it, while the production image is created from same base image but persisting released binaries into it.
To avoid mismatches in docker-compose definitions, we provide a base definition and modular overwrites which keep track of specific changes at DEV or PROD. These definitions are re-generated on each run, to ensure we never run on a stale setup. We prevent stale conditions on containers by providing means to recreate containers as well as associated volumes if the developer requires it.
Developer application dependencies are not persisted into Docker images, but locked by specifying required version number. The development environment will cache downloaded dependencies thus preventing their download and ensuring quick startups.
Settings for developer environments (which modules should run and if we should watch code changes) as well as credentials and SSL certificates are generated with defaults, but can be overridden by developers at any time.
When watching for changes, we make sure required modules are continuously compiled and the developer may request additional checks through configuration (test and/or lint the code).
Debugging is one of most important tasks performed when it comes to development. Time invested into making it easier, gaining visibility or just creating shortcuts to relevant directories saves developers time.
To provide console access, we added two commands that enable us to login into containers and attach tmux sessions with predefined window layouts over all running processes and databases. Panels are grouped based on interacting components so that we can concentrate on a given screen to observe behavior, diagnose issues, and work on a solution.
Since debugging sometimes requires a separate set of tools that are not part of the running applications, we developed a specific container with them, that attaches to the running environment and provides utility scripts that may help with the task. Scripts we develop to assist diagnosing a given situation in a better way or shorter time are included in that Docker image, to boost team productivity.
Failures may occur at any step of the stack and it is important to have visibility across all stages. If a request is failing, we shall know if it is due to bad Nginx mappings or because a service is failing to respond due to some reason. Our configuration provides the ability to make requests through the whole stack or to jump in at any stage to diagnose the issue.
There is light at the end of the pipeline!
Implementing a continuous delivery pipeline was a great journey which is still ongoing as we seek to improve our pipeline up to deployments. Some benefits we gained from it are:
- a quick feedback loop while developing the product,
- simplified configurations across multiple environments,
- short setup times,
- measurable quality and actionable items from reports, and
- dockerized images that work in any environment — released for every working commit we push.
Even if at first may feel a bit scary, the small batches principle turns out to help you develop faster and better.
Have you had a similar experience or are you setting up your pipeline? Ping us — we will be glad to hear about your experience! Thinking about a new job? We are always looking for the best professionals!