Simple, Repeatable & Free: An Open Source Software Delivery Pipeline (Part I)

Dan Stieglitz
Dec 14, 2020 · 6 min read
Image for post
Image for post
Photo by tian kuan on Unsplash

If you’re developing software, one of the most important processes you need to have in place is a solid build & delivery system. It sounds obvious — but it’s often overlooked — that several key operations must be repeatable and precise to ensure quality software. For example, repeatable builds: can I re-deploy the exact same code I deployed last week, even if I’ve since pushed new changes (breaking changes, of course) and differences: What’s different about this build from the last? What issues are fixed in this version?

Software packages often rely on other packages and form a dependency tree. Changes to software APIs in one package can break downstream dependencies and wreak havoc with production systems. In complex environments like Kubernetes, proper versioning and protocol makes the difference between a reliable system and an unpredictable, broken mess.

There are plenty of vendor solutions that manage this process end-to-end and some that provide parts of a complete solution. In this set of articles, we’ll outline a open source approach using completely free components and some open source code from Stainless AI (also free). The approach outlined here is a great way to set up a high quality software delivery pipeline (we use it ourselves!) at a very low cost. This approach works for teams large and small. If your team has more sophisticated or regulatory-driven requirements, you may want to explore some vendor solutions that offer automated compliance options or add-on services.

This article reviews some basics, in subsequents parts, we’ll dive into more technical details and some code but it’s important to understand what we mean when we refer to some basic ideas.

For a more in-depth look at some of these concepts and how to implement them in practice, we’ve published a document that details our internal version of Gitflow here:

https://dev-flow.readthedocs.io/en/latest

Concepts

Versioning

Versioning is the cornerstone of any quality software delivery pipeline. At Stainless AI we use Semantic Versioning (https://semver.org). Semantic versioning ascribes meanings to the parts of a version, so you know when a number changes, it means the version contains fixes (a patch difference), new backward-compatible features (a minor revision), or is significantly different and breaks previous versions (a major release). Check out the link above for the complete specification. Some folks refer to these as “breaking-feature-fix” revisions to reflect the semantics of the numbering system.

Components of a semantic version string

Environments

Software never lives on it’s own — it always lives in an environment; either physical (hardware/OS), or virtual, or within a framework like Flask, Grails or Kubernetes. Those environments have their own versions of stuff, and your software will be counting on certain assumptions about your environment to be true: does that API call exist? Does it return what I expect? Is that feature present? Your software pipeline needs well-defined environments to properly grow and function. At a minimum, a development environment and a production environment. In the former, your team can play and try out new things; in the latter, things should change infrequently and always reflect a release, see below.

Snapshots & Releases

Now that we have meaningful version numbers and defined environments, how do we use them to manage software delivery? The next concept we need to discuss is the software release, that is, when is a version actually a version and when is it just a work-in-progress? We can use semantic versions to tag a release as a pre-release using the pre-release part of the version string. This would indicate that the version is destined to be deployed in a non-production environment and is in flux. Once a set of features has been completed, we can remove that pre-release tag and produce a release. Note that:

  • Once a release is tagged and deployed, it’s never revised. If you need to fix something, you need to release a patch version.
  • Sometimes the line between “is it a fix” and “is it a new feature” is blurry, developers need to make this call.
  • If possible, use software to prevent pre-releases from being deployed into production environments.

Automation

Now that we’ve established some concepts and set up some environments we want to take our favorite open source tools and use their features to put our concepts into a concrete delivery pipeline that:

  • Maximizes our productivity by automating error-prone tasks as much as possible; and
  • Leverages the native features of our tools to implement our concepts of versioning, environments & snapshots/releases.

The heart of any development pipeline is the source code repository. We’ll use structure elements from the repository (commits, branches and tags) as a database that drives the automation of our versioning system through the pipeline. Just from the state of the repository (the “coordinates”) we can automatically create a version number for any build.

Commit

A commit (or revision), is a specific set of changes to the code that has a unique label and comments. This is the “unit of work” for software delivery. Commits live on one or more branches.

Branches

We leverage source repository branches to manage code flows through our development pipeline. Branches allow us to separate code into silos that have meaning like

  • “main” (or “master,” although this terminology is being revised) to mean “this is the latest release” and
  • “develop” to mean “this is the latest work-in-progress,”
  • myfeature” branch, which indicates this is a specific feature being worked on (replace myfeature with some descriptive name)

There are several popular branching flows like Gitflow: https://nvie.com/posts/a-successful-git-branching-model, and typically your team will settle on a modified flow that works for them (ours is detailed here: https://dev-flow.readthedocs.io/en/latest) Our goal here is to align the repositories branch structure with our desired versioning structure.

Tags

A tag is a unique, repository-wide label on a specific commit. It does not have a branch associated with it — it refers to the state of the entire repository at a specific commit. In our system:

  • Each release gets a tag
  • Each pre-release uses the last release tag to compute its version, appending branch and build number information to construct the appropriate pre-release tag

As we all know, policies are meaningless unless they are enforced, so we need to make sure our versioning policies are enforced by something. Most source code repositories allow you to freely tag revisions with versions but don’t enforce a particular policy. At Stainless AI we use Jenkins (http://jenkins.io) to automate our builds and releases. Jenkins is a multi-node, scriptable build platform with support for many other infrastructure components via a robust plugin ecosystem. Combined with a shared library for Jenkins we can build a powerful automated versioning platform that uses only the repository state to automatically and accurately tag our versions through the pipeline, removing one more error-prone task from developer’s plates.

Tying it All Together

Taking our concepts of Versioning, Environments, and Snapshots/Releases; along with the software tools provided by our core components: Commits, Branches, and Tags, we write a library that automates the process of versioning our software builds based solely on the state of the repository and the commit we’re trying to build. We’ll set up some rules for our development team (enforcing them in software whenever possible):

  • All releases must be built from the master/main branch
  • All releases must be tagged in the repository with a valid semantic version

Once our team adheres to these rules, we can calculate the version for any particular build based on the state of the repository at that commit, and check to make sure the commits we are actually building correspond to the versions we’ve tagged. By introducing automation and lightening our developer’s load, we can reduce errors in our delivery pipeline, improve team morale and productivity — not to mention the quality of our software!

The Library

Our shared library can be found at the link below. We’ll dive into some detailed use cases in future installments of this thread. Let us know if you have any questions!

http://github.com/stainlessai/jenkins-semci

The Signal

Practical notes on artificial intelligence, machine learning and computer vision from Stainless AI

Dan Stieglitz

Written by

Dan is the CEO of Stainless AI, Inc., which provides cognitive computing solutions to businesses through machine learning and artificial intelligence.

The Signal

Technical articles and musings from Stainless AI, a provider of cognitive computing solutions: machine learning, computer vision, data science, and more!

Dan Stieglitz

Written by

Dan is the CEO of Stainless AI, Inc., which provides cognitive computing solutions to businesses through machine learning and artificial intelligence.

The Signal

Technical articles and musings from Stainless AI, a provider of cognitive computing solutions: machine learning, computer vision, data science, and more!

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store