Infrastructure as Code — does it benefit regular Software Architects/Engineers?

Is it Infrastructure as a Service (IaaS), Platform (PaaS) or Software (SaaS) as a service, most companies are using in production or at least experimenting with cloud technologies. The popularity of AWS, Azure and others has many roots, some of them being the ease of use, scalability, interesting pricing models (e.g. pay as you go). One of the main trends in the DevOps movement is the rise of the concept of Infrastructure as Code (IAC).
There are lot of tools implementing this concept (Terraform, Chef, Ansible, Puppet, Salt, CloudFormation, …). They integrate very well with the CI/CD pipelines, frameworks, tooling. Usually a DevOps team at some stage uses an IAC tool to standardize, automate and provision different components of software stacks like networking (VPCs, gateways, ..), security, clusters (Hadoop, Kafka, Redshift,…), databases, also servers (the requirement is that a server/resource can be provisioned by calling an API) for microservices / applications, the range of services is usually broad. An example can be found here.
One of the interesting questions related to this is if and possibly how can IAC benefit classical software architects or software engineers, who build applications, microservices, databases, data lakes, not being a day to day DevOps engineer? Do I need to be an Platform Engineer to benefit from IAC or could it potentially benefit me too, adding value to other skills like programming/design, in general being a better developer/architect? Does it make sense to master IAC? Which benefits can it have? Will it reduce technical debt in my current project? Will it help with the next RFP? How can I, as a software architect/developer, benefit from IAC?
Infrastructure as Code (IAC) — What is it? High level introduction
According to Wikipedia „ Infrastructure as code (IaC) is the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. The IT infrastructure managed by this comprises both physical equipment such as bare-metal servers as well as virtual machines and associated configuration resources. The definitions may be in a version control system. It can use either scripts or declarative definitions, rather than manual processes, but the term is more often used to promote declarative approaches.” (https://en.wikipedia.org/wiki/Infrastructure_as_code)
The most fundamental idea here is to treat your infrastructure (IaaS / PaaS / SaaS) as you would treat software code. Network components, servers, cryptographic devices, specialized appliances, every component of an technical infrastructure is treated as a software defined object. To illustrate this on the following diagram we create the definition of an AWS network for our project, a public subnet and a NAT Gateway that will be used later by resources created in private subnet of this network (that would also be defined in this file). We can execute this file against any AWS accounts. This would have 2 consequences:
- These resources would be created in an AWS account (based on provided credentials)
- Terraform would store the state of those resources. If we would amend this file and execute it again, only the delta would be deployed (some resources may be deleted, some added and some updated, according to how they definition was altered)

The above mentioned tools (Terraform, …) provide high level programming/configuration languages that internally leverage the cloud services APIs to instantiate / destroy / update resources. With the adoption of cloud platform providers this is already a standard, but it is important to realize this conception. By doing this suddenly best practices applied to classical software development projects can be applied to infrastructure also. Hence all what we learned so far writing code and CI/CD pipelines can be applied also here (inclusive lint tools, style checkers, etc).
Standard IDEs used for java and other programming languages can be applied to IAC projects. On the diagram below there is an example of an IAC project using IntelliJ and Terraform plugin (syntax highlighting, syntax check, code formatting,…).

Infrastructure as Code (IAC) — drivers
There are multiple drivers for IAC, to name a few:
- Cloud adoption — lets take a look at AWS. Over 170 services, from networking to high end solutions like a data lake (Data Lake Formation), every service is a basic architectural building block, with defined API, that can be used / instantiated, to create end 2 end software architectures. We can build complete solutions leveraging and connecting those basic building blocks. With cloud we can instantiate in minutes software that with the traditional approach would take much longer to even get access to required hardware (servers…).
- DevOps practice — everywhere where platforms need to be managed, DevOps is present. Kubernetes/Openshift, Hadoop, S4/HANA, Kafka Clusters, all are complex platforms that require DevOps teams and environments that can be leveraged to experiment, test and develop. In a highly repeatable and stable manner of course.
- Growing complexities of tech stacks — in every projects besides providing the (micro) services we need to integrate security, logging, monitoring, auditing, etc. To control and evolve, we need a method that can in an iterative way bring us from the current setup to the target architecture.
- Increased need for environments (due to faster build / release cycles) — developing in parallel new features and would like to test them in isolation? E.g. to verify that a new feature improves the overall system performance we may need a copy of the system, deployed on multiple nodes with CPU / RAM similar to production environment
- Increased need for software quality — how to verify that a new node to the current cluster will not degrade performance / stability?
- Security / Compliance — how to guarantee / audit the current technical architecture?
- Need to strengthen the ability to audit / review the provisioned solutions in terms of high availability, security, performance — how to verify that the provisioned solution adheres to best practices? In the software development area we have multiple tools that verify code quality either automatically or through collaboration / review practices.
IAC — basic scope and applicability
To set understanding there are 2 basic areas we can split IAC applicability: infrastructure provisioning (e.g. Terraform, CloudFormation, …) and configuration management (Ansible, Puppet, Chef, …). They are both explained on the diagram below.
The first is about provisioning running services based on provided configuration on service level (e.g. provide EMR Hadoop Cluster parameters), the latter is about customizing those services (usually compute services like EC2) to meet requirements of a given application stack — e.g. after we have provisioned an AWS EMR Cluster (infrastructure provisioning) we may want to setup user / groups and configure them to work with Active Directory (configuration management). We could use Terraform/CloudFormation for the first one, Puppet/Chef/Ansible for infrastructure configuration. For the last mile of solutions deployment, the services / applications deployment itself, IAC is not applicable. It may be to some degree, but there are other options. There are different and better suited tools like Kubernetes, Docker, Helm, Jar files, etc to accomplish this. IAC tools are best applied to IaaS, PaaS and SaaS.

IAC — benefits it will bring to my non-DevOps skillset
So the question is it worth to invest time in IAC? Lets see where we could leverage it, focusing only on the aspects related to work of regular software architects (java, web, big data, …..) and developers (scala, java, web, ….).
The first advantage we get is the ability to quickly build solutions / POC, to experiment. In which situation can we use it? E.g. for showing to an existing customer that we could provision quickly an end 2 end solution / proof of concept (POC) and if needed copy it x times with the guarantee that each time we will deploy the end result will be the same. Leveraging cloud and IAC we could collaboratively build a solution — some team members working on networking and security, some on application components, some on serverless / managed services. It would cost only for the time the services are provisioned and running, no need to buy new servers. We could accomplish the same of course by hand, e.g. clicking through a web interface (e.g. aws management console), but it has many disadvantages:
- It is slow
- It can’t be repeated
- It is error prone
- It is “manual”, we don’t want to work this way
Another reason is the ability to create environments on demand. We know that they are in constant demand and every developer often needs to test in isolation. We could link a terraform project with a Jenkins pipeline and each time a commit or pull request on let’s say dev or feature branch happens, a new environment gets created, private to the commit/pr, versioned (because we can store it in git), reviewed and ready to be reused (the IAC code for the environment).
Having the components of our technical architecture described in software adds a lot in the aspects of quality and security. What can we do with our infrastructure if its stored in git, defined as code? We can
- Do a code review of it, collaborate, comment, tag and improve. We can leverage tools that we know (Crucible, Sonarqube, Git, Jenkins) from standard software development projects to our environments.
- Run it in a pipeline and verify that’s the environment is sound (from the time it was released to today it may happen that it wont build because external dependencies changed; at least we will be notified)
- If we are in a company that constantly provisions new environments we may build a library of components that can be reused between projects, so that each project is build on top of proven components (e.g. private network/public network/network with NAT gateway, database with/without high availability, Kafka Cluster with Kerberos/Kafka Cluster without security, DNS server,..,…)
- Being asked how does your dev / qa / prod environment looks like we can point to code directly
- Audit and security — we may handover the code to compliance teams /security experts and ask for review and hints for improvements.
- As architects/developers we know that it’s good to reuse instead of reinvent each time.
- Speed and innovation — provisioning of new environments can take minutes (create an environment by creating a new instance of software). If we have a running software leveraging e.g. Spark 2.2, we could prepare a version (branch) of the environment with Apache Spark 2.3 or Apache Spark 2.4 binaries, run some tests and verify what are the benefits.
Another aspect is learning itself. Most of the tools have community and / or official modules covering broad range of services (https://registry.terraform.io/). If we need e.g. Active Directory to verify / test how it integrates with the other components of our solution we can deploy it just by referencing it’s official module (e.g. https://registry.terraform.io/modules/neillturner/microsoftad/aws/0.2.0 ). No need to build it from scratch. We can spend our time more efficiently focusing on the integration itself.
Further reading
To start a very good article that focuses more on the the IAC tools and how they differ: https://blog.gruntwork.io/why-we-use-terraform-and-not-chef-puppet-ansible-saltstack-or-cloudformation-7989dad2865c
