When to use which Infrastructure-as-code tool

Ramnath Nayak
cloudnativeinfra
Published in
11 min readMay 3, 2019

--

Infrastructure-as-code (IaC) is the concept of writing code to represent your infrastructure requirements and using an IaC tool to apply those changes to your cloud/on-prem environment.

In this article, I hope to give you an overview of the pros and cons of each IaC tool so that you can get a feel for the best use case for each one.

If you want to quick refresher on IaC in general, why use it and how to successfully apply the idea, have a read of this first.

The problems IaC tools can solve

Broadly speaking, there are three different needs that these tools try to address:

  1. Create/change/destroy infrastructure resources such as compute, storage, networking components or platform services like database, Kubernetes cluster etc.
  2. Deploy/update applications on top of the infrastructure.
  3. Manage the configurations used by the applications.

There is a spectrum of solutions available out there to address a variety of needs. This naturally leads to two questions — what tools are out there and what is the best use case for each one?

Here is a visual representation of the leading tools and what they do, arranged on a spectrum representing the problem space they address. The ones on the left side of the spectrum focus on creating and managing infrastructure resources. The ones on the right side focus more on configuration management.

Read on below to know more about each tool.

The spectrum of leading IaC tools available today, width represents area of coverage

Packer

Packer is a unique tool that generates VM images (not running VMs) based on steps you provide. This helps generate images with all applications, libraries and configurations baked into the image. This may not sound like much, but is a powerful feature when you start building immutable infrastructure.

Packer concepts:

Builder: A Cloud where Packer creates the image you want to build.

Provisioner: Another tool which lets you deploy the application/ configuration on top of your image (Ansible, Chef, Puppet, Shell etc).

Post Processor: (Optional) Uploads artefacts and re-packages.

When Packer is run, it uses the Builder you specified to create an instance in your cloud account. Then the provisioning step runs and installs software. When the Provisioner finishes, the instance is terminated and the image saved on the cloud (or can be pushed to Vagrant to build a box).

Best used for:

Baking compute images.

Vagrant

Vagrant builds VMs using a workflow very similar to building Docker containers. You specify the base image (called a Box) in a Vagrantfile along with the steps to configure the VM.

With a simple command vagrant up, you can get Vagrant to provision a VM with everything set up as per the specifications. There is a significant collection of community built Vagrantfiles at the Vagrant website.

While Vagrant does have Provisioners that allow you to deploy on clouds, the main use case for Vagrant remains developer VMs within Type-II hypervisors like VirtualBox (as opposed to deploying infrastructure on clouds). This is because Vagrant focuses solely on spinning up VMs, there is limited support for other resource types.

Best used for:

Creating pre-configured developer VMs within VirtualBox.

Terraform

Terraform is the only tool to focus solely on creating, destroying and managing infrastructure components.

You use the Hashicorp Configuration Language (HCL) to describe the infrastructure resources you need. This DSL is simple enough to be mastered quickly, as statements are mostly key-value pairs.

Terraform concepts:

Provider: A cloud provider’s plugin to provision on that cloud.

Provisioner: Used to execute scripts on a local or remote machine as part of resource creation or destruction. Provisioners can be used to bootstrap a resource, clean-up before destroy, run configuration management (e.g., Chef, Shell command, etc).

Modules: Higher level abstractions provided by cloud providers (e.g., compute instance with block volumes attached).

There is no need for a client or a server to be installed on the infrastructure you manage, you just download the Terraform executable on your laptop or bastion server, enable the Provider (plugin) for the cloud provider of your choice and let Terraform do the work. The Terraform executable connects directly to your cloud and invokes the necessary APIs to manage the infrastructure.

Terraform runs in two distinct phases:

Plan phase: Terraform simulates a run and lists the actions it performs without actually performing them, very useful to make sure the change you are going to perform is as intended.

Apply phase: Terraform actually applies the change.

Terraform maintains a statefile that stores the state of resources it has created so far and uses this statefile (instead of what is really present in the cloud) to decide what changes need to be applied. This introduces additional complexity and makes it unforgiving if you make any changes outside of Terraform after provisioning. The reconciliation loop is a 3-way tango between what you defined, what is in the state file and what is really present in your cloud. Always ensure the state file reflects reality, or it can get messy.

Oracle Cloud Infrastructure (OCI) uses a Terraform-as-a-service to run its Resource Manager.

Best used for:

Managing infrastructure resources (no support for applications and config management).

Cloud-Init

Cloud-init is the most unsung and underrated IaC tool today. It is the only fully open source product in this list and is primarily used by the cloud providers to initialise various services on the VM hosts during its lifecycle.

For us consumers of cloud VMs, the main use case for cloud-init is to execute one-time scripts and commands during the first boot after provisioning using cloud-config modules to do things like install a particular software, set up directories, users, groups, package repos, additional SSH keys and other OS attributes. There is also a runcmd module to run any generic shell command.

You typically provide the commands in YAML format in the user-data field of VM provisioning form on your cloud provider’s console.

An example on Oracle Cloud Infrastructure to set the time zone for a VM during provisioning

You can also pass cloud-config scripts in the userdata folder of Terraform, for e.g., if that is what you are using to provision instead of the cloud’s console.

For a complete list of cloud-init modules, refer to the documentation.

One of the issues with tools like Chef/Puppet/Salt is that they need an agent to be deployed on the VM before they can start managing those servers. Cloud-init can come to the rescue in these scenarios, as there are modules in cloud-init to deploy the agents for Chef/Puppet/Salt.

A typical workflow for this combination of tools would look like this: Use Terraform to create the infrastructure, Cloud-init to deploy the Chef/Puppet/Salt agent and then invoke a Chef/Puppet/Salt script to deploy and configure the application.

Another useful cloud-init feature is the #include command to invoke a hosted shell script. For e.g., #include https://get.docker.com installs docker on your new VM from the cloud console without you having to log in or using any other IaC tool.

Best used for:

One-time commands and scripts to be run after spinning up compute.

Ansible

Ansible has quickly grown to be my favourite IaC tool. Unlike Chef, Puppet, and others, Ansible does not need a client to be deployed and can connect to servers and run commands over SSH. This also makes it a good general-purpose tool for building infrastructure as well as deploying and configuring applications on top of them. If you just want to use one lightweight tool, odds are Ansible is the one you need.

Ansible can also be used as command-line based tool for ad-hoc querying and reporting against your infrastructure estate. (For e.g., How many servers do I have in my estate that have a specific version of software installed and then use that info to deploy an update).

The default architecture of Ansible is to run in push mode, but it also has a pull mode if you want to pull the config from the managed server.

A few Ansible concepts:

Module: Commands that do something (for e.g., create a compute instance, ensure a directory is present on it, upgrade a package, etc.) There are more than 2000 modules available (idempotent) and you can use the shell/command module to run any commands for which a module does not exist (not idempotent).

Playbook: A collection of modules that you write to create infrastructure, deploy an application or ensure a config state. Uses YAML and Jinja2 templating, python and shell commands also accepted.

Role: Brings modularisation, templating and reusability to playbooks.

Ansible Tower supports invoking ansible playbooks from a GUI to manage your infrastructure estate. Rundeck is a third-party alternative to build self-service workflows with deep integration with ansible to invoke playbooks. Ansible Galaxy is a large collection of community-built roles and playbooks.

Best used for:

Ad hoc analysis as well as general-purpose, push based, agentless IaC tool.

Chef

Chef is a popular tool that is widely used for configuration management, but has a relatively complex architecture when compared to the other lightweight tools.

Chef concepts:

Workstation: This is where you install the Chef Development Kit and do your development.

Cookbook: This is your code describing the desired state, in a DSL derived from Ruby.

Recipe: The basic configuration element, a cookbook is made up of one or more recipes (you have probably noticed the clever naming convention by now!).

Server: The server acts as a centralised repository for metadata. Your cookbooks developed on the workstation are uploaded to the server.

Nodes: These are the targets (compute/device/container/cloud) that you manage with Chef. They pull the configuration specifications from the server in an act of ‘convergence’.

Knife: Command line interface used to interact with the nodes and the server. If you do choose to use Chef to spin up infrastructure, the knife plugin for your cloud is what you’ll use to provision servers. The client can then set up the server into a node and manage the apps and configs on top of them.

The Chef Supermarket has a large collection of community contributed cookbooks.

Best used for:

Deploying and configuring applications using a pull-based approach.

Puppet

Puppet is another popular tool for configuration management. Puppet is built on the client/server model that needs agents to be deployed on the target machines before puppet can start managing them.

Puppet can backup config files so that you can restore them later if needed.

Another interesting feature is its frequent polling to check that the config is in the desired state (every 30 mins by default). Unlike other tools that do the job when triggered, Puppet is always on to prevent configuration drift. This means that when you make changes (in response to production events for example) you need to update through your puppet scripts to prevent them overwriting your manual changes.

Puppet concepts:

Resource: These are the fundamental building block configurations you can manage. (For e.g., a file that needs to be present or a service that should be running or a compute instance on a cloud).

Class: A set of related resources grouped together that describes a service or an application (for e.g., a database server).

Manifest: This is your code describing the desired state of the resources, in a DSL called Puppet Configuration Language. You describe the resources and their attributes in key-value pair format.

Catalog: Compiled version of manifest that is distributed by the server to the agent on each node.

Module: Reusable collection of manifests and other resources (for e.g., to install an app) for sharing with other users. Puppet Forge is a large community built collection of modules.

Best used for:

Deploying and configuring applications using a pull-based approach.

SaltStack

Salt uses the client/server model and is designed to scale. Salt is a platform by itself, so you do not need to write platform specific code.

Salt is built using Python, supports Jinja template and you provide the code and config in a YAML file — all features it shares with Ansible. Salt even has SSH support which can support agentless mode.

Salt also has a scheduler that allows you to specify how often to run your code on the managed servers.

Salt concepts:

Master: The server process.

Minion: The agent running on each managed server.

State module: Describes the resource and the state it should be in.

Formula: The code you write with state modules, in YAML.

Package Manager: Used to package the state, file templates, and other files used by your formula into a single file. After a formula package is created, it is copied to the Repository System where it is made available to Salt masters.

Repo System: Hosts the package for distribution to masters.

Salt Cloud is a utility to allow Salt to spin up and manage resources on the cloud. It is possible to set up Salt to be highly available with a multi-master set up.

Best used for:

General deployment and configuration applications, quick parallel deployments.

CFEngine

CFEngine is the oldest configuration management tool available today and is the inspiration behind some of the other tools like Puppet.

CFEngine is the only tool in this list that does configuration management only, without any infrastructure management capabilities.

It claims to have a distributed architecture, but needs a policy hub server to be set up with which each managed server running CFEngine needs to be registered.

CFEngine concepts:

The academic origins of CFEngine starts to show when you unpick what the creator calls the Promise Theory, which is the driving force behind CFEngine.

Instead of issuing ‘commands’, CFEngine gets every part of the system to make a ‘promise’ to achieve a known state for a resource. If the resource state is achieved, the promise is considered ‘met’. If it failed — this is where CFEngine differs from the other tools — the promise has can be in one of many failed states like repaired, denied, failed or timed out.

Also central to CFEngine is the philosophy of Convergent Configuration, which means the system strives iteratively to get closer to the desired state even if it cannot get there straight away. The agent runs every five minutes by default.

Policy: The code you write in a DSL that is a sophisticated C-like language with datatypes and subroutines.

Control Body: A header of a policy that contains the sequence in which the bundles must be invoked.

Bundle: A container for promises.

Best used for:

Fast configuration management.

Similarities and differences

Market share

Chef and Puppet used to dominate this space, but Ansible is gaining a lot of traction, possibly because the community edition is feature rich and you only need to sign up for a license if you need Ansible Tower.

Architecture

Chef, Puppet and Salt now offer agentless modes for standalone instances, but their default stance for multi-server deployments seem to be agent-based pull model.

Platform abstraction

Chef and Ansible tend to require you to write platform specific code (for e.g., installing a package which uses apt on Debian and yum on Red Hat derivatives or what tool to use to start a service), but Puppet and Salt in particular tend to provide a higher-level abstraction that resolves the underlying platform.

Documentation

Salt documentation coverage is not as good as it could be, but the rest have really good coverage.

Licensing

All of these tools have a community edition offering a subset of what the enterprise license offers.

Cloud specific tooling

I have deliberately steered clear of cloud specific tooling such as CloudFormation on AWS, Azure Resource Manager or GCP Deployment manager as these are cloud specific.

Choosing a cloud agnostic tooling like the tools listed here can help you build a cloud agnostic solution that does not lock you into a specific cloud and also manage resources across cloud and/or on-prem.

--

--

Ramnath Nayak
cloudnativeinfra

Outbound Product Manager at Oracle Cloud Infrastructure