Everything-as-Code: lessons from Applications-as-Code

Ruud Schoonderwoerd
Nationwide Technology
8 min readApr 13, 2021

With the advent of virtualisation and cloud computing it is now possible to represent virtually everything as code. Or perhaps I should say “everything virtual”, as the one thing it doesn’t include is physical tin. “Everything-as-Code” can refer to the domains of infrastructure, platform software configuration, networks, CI/CD pipelines, monitoring tools, test artefacts, and indeed entire test environments as well as the production environment.

Everything-as-Code (or “EaC” from here onwards) has the potential to make the jobs of IT engineers easier, and to generate benefits in terms of traceability, consistency, repeatability, collaboration, governance, and security.

To gain these benefits, it needs to be done well. As you can imagine, EaC is a huge area requiring specialist skills in each of these domains. As a result there are many challenges (more on this later). This article however is about a common theme that binds them together: the fact that everything is code. To do EaC well, we need to apply our experience with Applications-as-Code. For example, a common EaC anti-pattern is that, as part of the adoption of something-as-code, existing manual configuration tasks are converted into a script, and proudly declared as automated. This misses the mark. Applications aren’t just scripts. Applications are software. They are engineered and maintained as such.

What does this mean for EaC?

Code is version controlled and maintained in a code repository

Version control systems are universally adopted by application developers to manage code. They provide traceability and a single point of truth. Their branch and merge capabilities enable collaboration between teams of engineers.

Not having version control for EaC could mean script files residing in different locations, lack of knowledge of what is the latest tested version, and a cascade of issues following from this.

Code has a CI/CD pipeline

In application development, the CI/CD pipeline ensures that code is converted into deployable build artefacts, and that it is automatically tested and quality checked, and assessed for security vulnerabilities, before being deployed to production. A well-engineered CI/CD pipeline provides traceability and a tight amount of control over what happens in each environment.

The development of code for infrastructure, platforms, network configs, and even CI/CD pipelines themselves will benefit from their own pipelines to ensure testing and quality checks take place before deployment. It will also provide traceability, meaning knowing at all times what is deployed where.

There are differences with application CI/CD pipelines. In particular there may be constraints with regards to environments. For example, a physical base infrastructure test environment may be needed to test infrastructure-as-code. However, the principles will be the same.

Code is reviewed, both by humans and code quality tools

CI/CD pipelines can make sure that code is reviewed before being deployed. These reviews can either be manual or automated, or in most cases a mixture of both. A pipeline can be configured such that no unreviewed code makes it to production.

In the context of EaC, a key point here is that it is the code that is being reviewed, and not the thing that it results in (i.e. what ended up being deployed). With version control, code reviews, and CI/CD pipelines in place, engineers shouldn’t need to login to a server, firewall, or network switch to check or change its configuration. Instead they login to the code repository to read the code, and make any corrections there. As these corrections are tracked using the version control system before being deployed via a CI/CD pipeline that forces reviews and testing, engineers can be fully confident that the code works, and that it is what has been deployed.

Tests are automated, and automatically run

Mature CI/CD pipelines execute application tests automatically. This automation should include preparation of the environment, stubs, and test data.

The same approach can be applied to EaC. The challenge here is that tests of shared platform changes need to provide the confidence that applications continue to work. One way of achieving this is to run the continuous integration tests of these applications against the platform. As these tests are code, in the code repository, they could be made accessible to shared platform engineers.

Of course it is possible that these CI tests don’t provide enough coverage and something still breaks in live. The book “Software Engineering at Google” describes an interesting approach to this dilemma. Based on the famous phrase in one of Beyonce’s songs, they say “if you like it then you shoulda put a CI test on it”. What this means is that it is the responsibility of the application owner to provide full CI coverage, and that a successful CI test for a platform change is enough for the platform change to go live. The Beyonce rule creates a positive feedback loop: it provides a significant incentive for application engineers to ensure completeness of their CI tests, and shared platform engineers gain the confidence to put platform changes live.

How does this apply to tests-as-code? In other words, how do you test tests? The objective of tests is to find defects, so to truly prove a test, you need to test against defective code. There are now tools available to facilitate “mutation testing”: the deliberately introduction of defects in code, to test that tests pick up on these.

Recovery is redeployment, and it is needed less often

Applications are not directly tinkered with in production or any other environment. If there is an issue, an application’s code is changed to fix it, and the application is redeployed.

Outside of the world of applications, without EaC, integrated IT solutions are typically made to work by trouble-shooting the environments they run in, and making direct adjustments to configurations in those environments. Engineers then need to make sure that these changes are reflected in a “master configuration”, but this can be forgotten. If a server stops working, it often needs to be recovered to a previous state, but how can we know that state with certainty? That’s why IT organisations take backups of servers even if there’s no data on them.

With EaC everything we deploy is based on carefully managed and version controlled code. There will no longer be a need for server backups to recover to a configuration that worked in the past. Instead, a recovery is a redeployment.

Such recoveries will also be needed less. Servers don’t often spontaneously stop working. Frequently human error is at play. When configuration is managed as code, and controls are in place such as CI/CD pipelines to automatically test and deploy this configuration, the scope for human errors is reduced. Nobody should need to manually change server or network configuration directly in the first place, restricting the need for recovery of the server to hardware failures.

Code = Documentation, and Documentation = Code

In organisations with traditional waterfall methodologies, infrastructure, platforms, and networks, are typically specified using detailed design documents, which go through manual review and governance processes. As you can imagine, ensuring that these design documents are in-sync with what is actually being built, and vice versa, is such a big challenge, that engineers cannot be certain about the correctness of these documents.

With the practices discussed here, the need for such detailed specifications is reduced. The code is the master. When code is changed, it is subject to review processes enforced by a CI/CD pipeline. Any detailed documentation that is needed, needs to be closely aligned to the version of the system being documented. Keeping detailed documentation in a separate location is counterproductive as versions will soon diverge.

Environments are code, and automatically generated

On the path to live, application code is typically tested in different environments. The first stages, unit and component testing, may happen on a developer machine or small sandpit environment. Testing a whole system requires a larger environment as it includes components built by other engineers. As the path to live continues, the demands on environments increases. Testing an integrated solution may require environments with systems built by other teams. If you’re working in an organisation that exists for long enough, some of these systems may be years, or even decades old. Integrated non-functional and performance test environments have to mirror the production environment as closely as possible, including copies of these older systems.

The further you are in the path to live, the more likely it is therefore that these environments are subject to contention with other initiatives that also require testing. Environment management teams spend a great deal managing the allocation of environments to projects. It is a tough job as it relies on projects estimating when they are ready to use environments, and these estimates often change.

If everything that comprises an environment is represented as code, it can be generated at the click of a button. Provided the target base infrastructure is available to run it on (which, if you are in a cloud environment, is typically the case), there is no environment contention.

Integration with old legacy systems that are subject to contention remains a challenge, but this is where it pays to invest in highly accurate stubs of these systems (stubs-as-code, of course), as well as full test coverage during the earlier system test phase. This will help significantly lower the amount of time needed in an integrated legacy environment.

Every engineer is a software engineer

The adoption of everything-as-code has the potential to generate huge advantages if it is done properly. Code is software, so you need to apply the software engineering skills and techniques that have proven themselves in the area of application development.

Therefore, if an organisation decides to adopt EaC, then the engineers in the fields of infrastructure, networks, devops, security, testing, etc. need to become expert software engineers.

Challenges

Hopefully I’ve convinced you that to do Everything-as-Code well and gain its benefits, you need to apply what we know about Applications-as-Code. However it is worth bearing in mind that this isn’t a silver bullet. This is a paradigm shift after all, and with that come many other challenges.

There is a plethora of tools and coding languages out there, often designed to solve a specific problem within a specific domain: just have a look at the tooling landscape maintained by the cloud native software foundation. Specialist tools and domains require specialist knowledge and skills that are often hard to find.

Another challenging area is security. For example, instead of focussing on (physical) infrastructure, the focus of security professionals needs to shift to the security of processes and tools that generate the infrastructure. This is hard because of the complexity of the tooling landscape, and because it is a shift in mindset that will take time to embed in an organisation.

Finally, with the ease at which new environments can be spun up in the cloud, there is scope for chaos to creep in. House keeping processes and monitoring will be required (both as-code!) to keep cloud consumption in check.

Nevertheless, the lessons from Applications-as-Code should provide a useful frame of reference to help along the journey of adopting Everything-as-Code.

--

--