Splitting Build and Test Infrastructure
Disclaimer : The post was originally written a year ago while heavily sleep deprived and living off grande lattes. It is mostly just an opinionated rant with very little weight and should be read with a pinch of salt. Enjoy!
“You should never, never doubt something that no one is sure of.” ― Roald Dahl, Charlie and the Chocolate Factory
This post is borne from conversations with developers, managers and some personal observations around Build Infrastructure, Test Infrastructure and the benefits of separating some of their components.
In my previous role as a Build Engineer at Demonware I spent 5 years maintaining Build Infrastructure (BI) and Test Infrastructure (TI). This post is an attempt to clarify the characteristics/differences between BI and TI. I’ve spent a lot of time building “one-size fits all” immutable environments to satisfy both the build and test functions. In the absence of a QA team this approach did the job but it was incredibly wasteful and may hinder the ability to scale in the future.
While BI and TI share characteristics, they differ greatly in terms of investment, ownership, risk, life-cycle, elasticity and complexity.
“We have so much time and so little to do. Strike that, reverse it.” ― Willy Wonka
What is Build Infrastructure?
From my own experience Build Infrastructure consisted of the tooling, services and environments required to create a deployable artifact from source.
Some core components of BI may include :
- Source Control Repository
- Build Environment
- Artifact Repository
A “service” or “application” can be built and deployed using the core components listed above. The CI server has intentionally been ommited. While the CI server is the preferred path for builds to take, we should still be able to build and deploy a service without the CI server in the case of an outage.
These components satisfy the what, how and where of the build process. BI can be thought of as a factory. The raw material is code. The machinery is the build process and the created artifact is the product. What about Quality Control? BI is not responsible for the quality of the artifacts being built. Garbage in, garbage out. The raw material should be checked before reaching the factory and all products should be tested before reaching the customer.
We are only interested in the comparison between the Build Environments and Test Environments in this post.
Build Environment (BE)
Some of the common characteristics of a good Build Environment include :
The environment, services and build dependencies need to be available to allow builds to be re-built byte-for-byte in the future. Relying on third party dependencies, hosted on a third party platform to build a core service may work fine but introduces risk.
The source control repository is critical. This may sound obvious but from past experience many source control tooling is not highly available due to additional cost, complexity or the feeling that “It’ll be grand”. The build environments are also important but should be defined in code and can re-built within minutes if required. Similarly the destination repository/registry should be defined in code and data recoverable from regular backups within minutes.
There should be no black boxes, hacks or workarounds within BI. The structure, status and origin of BI should be available to every user. Each user should have the ability to trace their build from source through the build process to the artifacts destination.
This is a combination of reproducibility and availability. The user should receive a predictable and consistent experience during the build process regardless of when a build is running or where the build runs. This is particularly challenging if builds rely on 3rd party tooling and libraries which may change upstream.
Developer : “Where are the builds running?” Build Engineer : “No idea”
Using tools such as Docker and Ansible the entire BI should be portable. This portability should be exercised as regularly as checking the integrity of backups. Change is the only constant after all. Using pipelines to deploy services nightly and to test build environments helps to protect from code rot
“Quality is free, but only to those who are willing to pay heavily for it.”
Tom DeMarco (1987) Peopleware: Productive Projects and Teams.
What is Test Infrastructure?
From my own experience Test Infrastructure consisted of the tooling, services and environments required to verify that an artifact satisfies the specified requirements.
The core components of TI may include :
- Source Control Repository
- Test Environment
- Artifact Repository
These core components are a simplification of the input, test execution, output functions.
A comprehensive TI consists of many other supporting services such as a test scheduler, environment orchestration, centralized logging, metrics gathering and reporting.
“Testing is an infinite process of comparing the invisible to the ambiguous in order to avoid the unthinkable happening to the anonymous.” — James Bach
Test Environment (TE)
The TE and arguably the testsuites, should be designed to be highly scalable and runnable across a distributed platform. From experience I’ve never been asked for more resources to build artifacts. More resources for testing however is quite a common request. In the past our response to such requests was to increase the resources available to the build environments to handle resource hungry testsuites.
Modifying build environments to support testing was problematic for two reasons :
Glass ceiling of resources
When do we stop adding more resources? As we increase the resources per environment the cost of build and test increases.
Reproducing environments locally becomes more difficult
As the build and test environments become more resource hungry the ability to run them locally becomes more challenging.
“We cannot solve our problems with the same thinking we used when we created them.” — Albert Einstein
Build Environments (BE) vs Test Environments (TE)
The BE is typically lean and provides just enough resources required for the build function. A BE may be a container, a VM or baremetal but it is not generally required for the BE to be part of a distributed system or to scale on demand.
The TE is more complex. The TE should be capable of running any number of tests on a single environment or across hundreds/thousands of environments. This scalability needs to happen seamlessly, as part of the test setup or preferably during execution as resources become available.
The TE should be as close to Production and Staging as possible. If Production and Staging are running in a distributed fashion then so should the tests. The BE has no such requirement.
This next point is purely anecdotal. A misconfiguration in the BE is a lot more forgiving than a misconfiguration in the TE. I have seen entire test cycles being invalidated due to the wrong test dependency being defined in a config file. In one case the wrong build was tested due to a misconfiguration and the build deployed to production was untested.
Test Infrastructure Challenges
Test Infrastructure has some additional challenges :
Test Infrastructure can become incredibly expensive (especially when developers forget to cleanup hundreds of AWS instances) and cost per test cycle will increase without continuous optimization of products, tests and environments. As more tests are added, more testsuites are run and the complexity of the supporting services increases so will the cost of each test cycle. It isn’t all doom and gloom. As the cost of compute drops teams can leverage the “portability” characteristic to move tests between providers to find the best price.
Test Infrastructure is constantly changing and this requires ownership. Test environments, test dependencies and testsuites need to be managed with the same rigor as the product being shipped. Ideally TI would be managed by a QA team but in the absence of QA the ownership may fall to the developers. Developers owning and sharing TI may lead to a number of positive outcomes :
- Better understanding of testcycle and dependencies
- Help identify areas of optimization
- Help identify overlapping test requirements between products
- Reduce risk of stale environments/dependencies
Test Infrastructure which is under resourced and unowned can leave you open to the following risks :
- Stale test environments
- Stale test dependencies
- Deploying untested code to production
I’ve come across (or may have been responsible for) these 3 scenarios in the past and while no-one died it was quite embarrassing. You may be testing one artifact and deploying another. Test environments and test dependencies are always changing and they need to be rebuilt regularly. Needless to say every TE should be defined in code.
A change in how and where tests are being run will require investment from each team. Many testsuites are written to run on single environments and may require development time to allow them to run in a distributed manner.
“I remember the days when QA testers were treated almost as second-class citizens and developers ruled the software world. But as it recently occurred to me: we’re all testers now.” — Joe Colantonio
Change is coming
Build functions previously performed using CI/CD services such as Bamboo and Jenkins are being integrated into source control systems. Gitlab and Bitbucket are offering built-in CI/CD functions. Will Github follow?
What will this mean for tooling such as Bamboo and Jenkins? Jenkins, which many treat as an automation engine, as well as a CI server will continue to evolve. Work being done by Cloudbees, in particular BlueOcean, will make the Jenkins UX beautiful and a joy to use. Pipeline Editor in BlueOcean looks awesome. Hosted CI/CD solutions such as CodeShip, CircleCI, Travis and Drone will continue to thrive and reduce the cost of building. JenkinsX is also gaining traction and worth an evaluation imo.
Build Infrastructure is reasonably static compared to Test Infrastructure. Build requirements change less regularly, build environments change less regularly and there is a lesser need to scale on-demand. If a Build Environment is producing containerized artifacts from source do we really care if it has a slightly older version of Docker installed when compared to production?
Test Infrastructure is much more fluid and unforgiving. Parity between Test Infrastructure and Production requires coordination, communication, buy in from multiple teams and stakeholders.
From experience Build Environments are less expensive to operate. We could have met our build requirements by using a few dozen t2.micros on AWS. This is not the case for test environments which are often more complex to setup and require more resources for longer periods.
Using tools such as Docker, Ansible and Docker Swarm developers are being empowered to create production like test environments from code and deployable across any platform.
If you made it this far then thanks very much for reading. Feel free to call out BS in the comments.