Stop “Testing” Infrastructure as Code!

Published in

DevOops Discourse

5 min readOct 30, 2022

The views expressed here are my own and in no way represent those of my employer, Accenture. Moreover, they do not necessarily represent my behavior given contractual or cultural considerations. This blog is purely intellectual; the ideals expressed here may require strategic consideration. Proceed with caution.

Or at least stop calling whatever it is you’re actually doing a test. A test (unit, integration, etc.) suggests that you are testing the code by giving it known inputs that produce expected outputs.

Testing is extremely useful in software development and I truly hope the code behind every framework (API, CLI, IaC framework) is tested by its development team. Either way, we should NOT test THEIR code (unless we are developing FOR the project e.g. as an OSS contributor).

Undeniably, validation of your inventory is essential. List operations can be too slow for any time-sensitive operations, but validating your infrastructure need not be real-time. You’ll wait an extra couple of seconds to get an accurate report. Focus on improving the reporting. Put it on a chron to check regularly if you want that level of assurance.

There’s obvious value in regular checks against known configuration if drift is possible (unfortunately, it’s fairly likely with many popular frameworks), but frankly, there’s a major smell if you expect your infrastructure to drift from your configuration often. That implies manual updates. Stop manually touching things in your non-sandbox environments. Leave it alone! Learn in a sandbox. Ideally, your configuration should never drift because of manual changes, or you should not care that it does. Either you’re working in a sandbox environment to get your hands dirty, or it’s a pristine, safe environment in which to run custom workloads[1].

However you construct it, drift detection should not be part of a pipeline used to apply architecture configuration; it should instead be part of your live infrastructure observability solution. It is extremely powerful, but it is unnecessary in a properly constructed IaC pipeline; instead, include policy checks of your infrastructure code. When we find a “defect” in software and resolve it, we write a test to keep it from recurring; similarly, when we find a “defect” in IaC, we should construct a policy that will disallow the configuration in the future.

When we call these validations “tests”, there’s a temptation to use them to almost “test drive” (TDD) the infrastructure: to learn and design as one goes and treat cloud infrastructure like an Agile project. This is a terrifying prospect. Do not treat infrastructure like custom software. Your Prod environment is not Instagram: it will NOT have a wildly different purpose years after its initial inception. It will always have to adhere to NIST/CIS/GDPR/HIPAA/etc. It will always be expected to serve your custom workloads effectively and securely. Don’t be afraid to fully define great chunks of your architecture at once before ever beginning the development of your IaC. Do be afraid to leave great chunks of your architecture a mystery beforehand.

There’s also a strange notion that one can/should “performance test” one’s infrastructure. What would you even do if you think you detect your resources are less performant than you have configured them to be? File a bug, I guess, but how do you determine this with any certainty? Use some ‘Hello World’-type service and risk an oversimplified use case that misses any corner cases your custom code contains? Build some representative use cases that we leverage to evaluate solutions? It all seems like a lot of overhead for little upfront value.

Certainly, we must evaluate the performance of our software, but some environments are intentionally less performant (e.g. a dev or integration environment). Obviously, one would not run anything but the most basic of performance tests there, so why run them as part of the infrastructure pipeline at all? Conduct these tests on a regular schedule or as needed with your actual application code and with real/representative data and simulated activity against a production-like environment[2].

Additionally, don’t worry about the performance of your pipelines themselves. Ok sure, if the pipelines take dozens of minutes or more, you probably need to address this, but if you are running a multi-region, complex, multi-tiered environment setup and your pipelines run in < 5 minutes, don’t spend excessive time and resources trying to optimize further (use your best judgment here; there are absolutely valid pipelines that should run in under a couple of minutes and a 5-minute run means a timeout or something; #ymmv).

The tech industry has plenty of idiosyncrasies, and perhaps railing against overloaded terminology is a fool’s errand. I personally believe communication is among the weakest skillset across our industry, especially in making reports to less technical colleagues. I’m not suggesting a perfect and universal ontology would solve all such problems, but I do believe our careless habit of homonym ad nauseam does us no service. Accurate, precise (with multiple qualifiers, if necessary, begging a clever acronym) terminology can only improve the conversation, which should better clarify the expectations, and thus the requirements, around developing IaC. Clearer requirements do not beget a better product, alone, but they also will never beget a worse product, alone.

At the risk of sounding like an afterschool special, we all have the power to use better vocabulary. We are highly skilled and none of us were born with these skills; we worked hard to develop them. I believe I have shown we need not even develop a new term in this case, just that we should use the proper terms for the concepts we want to communicate more consistently. Detect and correct any drift of your infrastructure, employ policies to avoid misconfigurations, and leave the tests for your custom software.

Footnotes:

[1] Perhaps your “dev” environment is something of a sandbox, but by the time you are running custom code, you really should minimize manual configuration, except maybe to force-stop/terminate services or resources that are out of control. I recommend introducing “break glass” roles that can be temporarily assigned and provide elevated (but not full admin level) authorization in such cases.

[2] Baselining is…non-trivial to say the least. But why would one ever baseline against Dev? Or against Hello World? It is an appropriate and worthy endeavor to introduce a battery of performance tests as a prerequisite to e.g. graduating custom code from test to production, but unless you’re doing some gaudy blue-green style environment graduation, it is essentially meaningless in automated IaC.

Stop “Testing” Infrastructure as Code!

Written by Robert Glenn