Flow, feedback & continuous improvement — release engineering at Loblaw Digital
“Release engineering is the process responsible for taking the individual code contributions of developers and bringing those to the end user in the form of a high quality software release.” — Bram Adam, et al. (Modern Release Engineering in a Nutshell: Why Researchers should Care)
At Loblaw Digital we are all about delivering amazing digital experiences on all platforms and devices. And when it comes to mobile devices it’s no small feat. But our never fading passion and obsession for keeping the customer first has lead to the natural emergence of an Engineering Productivity team, and specifically a Productivity and Release Engineering team for mobile app development (PREM).
Productivity and release engineering is the discipline of not only removing obstacles from the path leading to quick software production and release, but also increasing the pace down that path:
“Speed is essential because there is an opportunity cost associated with not delivering software.” p11 Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation by Jez Humble & David Farley
So naturally questions arise like: How quickly and easily can you share an app demo with stakeholders? How quickly and easily can you resolve production issues by releasing a fix to production? Which begs the next question, how long does it take to release the app into production after making only a one-line code change?
Some teams try to answer these questions long after they write their application — way too late for the serious software product developer. From the beginning to the end of each product development iteration your eyes should be on the customer. PREM is all about helping teams keep this focus.
The release engineering in PREM considers the three pillars or principles of DevOps — flow, feedback, and continuous improvement, but it’s not accurate to think of release engineering as just DevOps even though the roles and responsibilities in each domain often overlap. DevOps does not include version control and source control management, build automation and test automation, nor does DevOps focus on building tools to increase developer productivity and increase the speed of software delivery and deployment. DevOps has a larger scope as explain here, where as release engineering focuses on a continuous integration and deployment pipeline with an input of developer source code and an output of a signed binary product that’s ready for release.
Flow is about the focus of the left-to-right movement of work from development through to operations, then out to the customer. In this context work is the engineering and development on the code base, the code base being the product. The left-to-right signifies how continuous integration and continuous delivery (CI/CD) pipelines are typically drawn when being designed and analyzed. They are drawn from left to right, as opposed to drawing them top-to-bottom.
We want fast flow of work and try to limit the work in progress. It’s of paramount importance to understand the flow of work and to automate as much toil as possible.
Automation plays a critical role here. Anything considered toil should be automated. And by toil I mean those manual, repetitive, automatable tasks that are a part of the workflow. Since toil comes in various forms, when considering the elimination of toil, other criteria to help identify toil are:
- Reactive rather than strategic or proactive. An example could be having to handle instant message or email alerts. This toil is interrupt-driven and can be distracting and disruptive, taking away focus from higher-value tasks.
- Void of enduring value (i.e. no permanent improvement is gained from repeating the task several times)
- Scalable (does task load or complexity grow linearly or faster with the growth of the project). If the automated toil is still relevant as the project grows, then it was worth automating. It was scalable.
Here’s a common CI/CD pipeline process including the very important test regime:
- Application must be compiled and built. Build should be hermetic, that is, builds should be consistent and repeatable. They are insensitive to the software installed on the build machine.
- Unit and integration tests must pass
- A certain level of test coverage and other relevant metrics must be met before moving forward. Static code analysis is often used to measure the application code quality
- Functional acceptance tests must pass (for example, user interface and end-to-end tests)
- Non-functional test must pass — for example, performance and security testing
- Distribute application binary for exploratory testing or employee-based beta testing
- Distribute application for release to the customers
I think of test automation first and foremost when it comes to pipeline feedback. However, it is really about receiving some kind of indication about the state of the software product as it flows through the CI/CD pipeline. It’s commonly thought of as the information that flows from right to left at all stages of the pipeline; the pipeline is a big part of the technology value stream, and faster detection and resolution of problems (stopping the flow when needed due to defects) the higher the pipeline performance. One of our major goals here is to shorten and amplify the feedback loop.
Having quick feedback supports a culture of experimentation and organizational learning. Continuous improvement comes from continuous learning across Loblaw Digital (LD) with the aim of creating a collective consciousness of all teams, so that anyone in the organization can access this wisdom when performing work.
The only chance of consistently and quickly releasing great software products is by standing on these 3 pillars or principles.
By identifying the bottle necks and problem areas in the workflow used to develop a software product, and addressing these problems, we can boost productivity. It’s by our feedback systems that we can pin point problems and apply fixes and improvements.
Everyday we try to see our applications from our customer’s point of view. Simply putting it, our application is comprised of executable code, a specific configuration, a host environment, and data. Since the PREM team’s focus is on mobile app release engineering and the mobile app CI/CD pipeline, we pay more attention to the executable code, the app configuration, and its host environment; the host environment in this context would be the physical mobile device the executable code runs on. The data coming from the backend API, although important, is beyond the scope of the mobile release engineering pipeline.
If there is a change in any of these four components the application has essentially changed, and every change needs to be verified. And not only does every change need to be verified, but changes in the application need to be released to the app user quickly.
LD prides itself in responding quickly to customer needs, this global pandemic has shown that more than ever; and so the release velocity metric is an important one. As our apps grow in complexity so can our release processes. Release can become painful and very time consuming, especially when the CI/CD pipeline takes a backseat or is practically non-existent; this is similar to having a waterfall development approach to designing and building a CI/CD pipeline.
When considering the passionate people at LD devoted to quickly and efficiently releasing software, and in particular the PREM team, consider what Dinah McNutt says in Release Engineering: How Google Builds and Delivers Software,
“Release engineering is a specific job function at Google. Release engineers work with software engineers (SWEs) in product development and SREs to define all the steps required to release software — from how the software is stored in the source code repository, to build rules for compilation, to how testing, packaging, and deployment are conducted.”