Ansible Best Practices: Part 1

Published in

Polar Squad

10 min readJun 9, 2020

There are a lot of DevOps tools out there. Selecting one for the job is not always easy. Using it effectively from the start even more so.

Maybe you are investigating Ansible due to moving to a new platform, and there is a need to manage all the resources. Or you have a heap of handcrafted scripts to manage your infrastructure, but maintaining them is becoming painful. The initial steps with a new tool may also be staggering, and all the possibilities may seem overwhelming.

This post and the following one strive to answer the questions: “When should I select Ansible?” and “How do I use Ansible effectively?”. Perhaps also: “Why is Norway false?”.

Google search: “DevOps Engineer”. Did you mean: “YAML file engineer” — From: r/ProgrammerHumor

Read on for some help in selecting the correct tool for the job. Catch some Ansible best practices along with specific tips and tricks.

What’s Ansible?

Ansible is an automation tool used for configuration management and orchestration of infrastructure, systems, and applications.

It began its life as the project of a single person. Was then expanded to be a company and is now sponsored by RedHat. Backed by an open source community, it is mature and widely used.

Ansible, like it’s well-known comrades: Chef, Puppet, and SaltStack, is used to describe a system’s desired state as code, which in this case means YAML with a sprinkle of Jinja2 templating on top. All the listed tools strive for idempotency, which in short means that they make changes only when necessary. There’s some difference in whether this is achieved declaratively or imperatively, but the result is the same.

It has an agentless architecture, so it’s effortless to get started with. No dependencies except Python to handle on the hosts to be managed. Write a playbook in YAML, and you’re off. The direction is push, rather than pull. By default, it connects to the target hosts with SSH or WinRM in the case of Windows boxes. If you are used to writing that install.sh or install.ps1, you will feel right at home changing that to install.yaml and getting all the goodies that come with the Ansible automation framework.

Ansible can also be used to connect to various APIs for provisioning cloud servers or to set up Kubernetes clusters. As such, it can be used as a complementary tool or a replacement for Terraform and nothing is stopping you from creating Kubernetes YAML manifests infused with Jinja2 templating and deploying them with Ansible rather than Helm.

But as the landscape of open source tools is vast, the question is: should you?

What is it good at?

As stated earlier, it’s effortless to get started with. Distributed as a single Python package, Ansible comes with everything but the kitchen sink.

Prior to the, currently under development, 2.10 version this is especially true since the core of Ansible has modules such as package and win_service, which manage the resources they promise with their names. But there are also some more specific modules. Go ahead and run Terraform plans with Ansible too. All this with a single pip install ansible. After 2.10 lands the user experience might not change that much, but in the future, the modules will live outside of the core in collections and can be managed separately.

Ansible is much used in managing network equipment as well. From routers to switches, the agentless architecture makes Ansible a good choice for automating the somewhat tedious tasks of assigning VLANs in Cisco equipment or interfaces on Juniper’s Junos. In these cases, the connection made to the devices differs from the default SSH. Thanks to the plugin architecture protocols like NETCONF can be leveraged to manage all the things.

Written in Python, Ansible is easily extendable. So if the list above does not contain the module you need, you can write some of your own or extend the included Jinja2 filters with a plugin. Most often the ready-made modules are enough and the desired system state is achieved with a combination of tasks.

Ansible is also easy on the eyes. Tasks to run are written in YAML, which is turning out to be the de facto configuration language of many other modern tools as well. It brings some gotchas along with it, but mostly a list of tasks…

- hosts: localhost
  # Use sudo to become root.
  become: true
  tasks:
    - name: Install Apache
      package:
        name: apache2
        state: present
    - name: Start and enable Apache
      service:
        name: apache2
        state: started
        enabled: true

…is easily understood by the non-sysadmin as well.

It is easy to do easy things with Ansible, and the more difficult tasks are not too hard either as Ansible allows for loops and other constructs that take the YAML a step towards an actual programming language where anything is possible. Ansible’s syntax lies somewhere between imperative and declarative, mostly closer to the latter on the axis of programming paradigms. You focus on what you want, instead of how. The purists will identify the imperative aspects as well. If all else fails, add some Python code in a Jinja2 templating block to achieve even the most complex task.

In addition to writing playbooks for running tasks, Ansible has a concept called ad-hoc command which can be used to make changes to running systems without writing any YAML. For example, to restart all the Apache instances on all the hosts in the webservers group. Kind of like SSH’ing into all the hosts, and running systemctl restart apache2. This and the syntax make Ansible very suitable for situations where the operator is very used to managing the systems but needs some robot hands to help with managing a fleet.

If you strive to achieve immutable infrastructure, like that which usually is associated with running containers, Ansible can be tagged along with Packer. Used together, Ansible manages the state of the images to be deployed. An Ansible-centric way of building containers is available too, as the pure-Dockerfile syntax might start to annoy you at some point.

By the way, the short snippet above is entirely functional! You can put it in a file named playbook.yml and run the command ansible-playbook -i localhost playbook.yml after installing Ansible itself. Without any changes, it will install and start Apache on a Debian-based Linux box. Straightforward to start with, isn’t it?

What are the downsides?

Now, you could have a lengthy discussion with me and I could convince you that there are none. But for the sake of realism^wkeeping my colleagues happy, there are some things that might bug you with Ansible.

Jinja2 has its peculiarities, one of which is the treatment of everything as a string by default as it is intended for rendering text files at its core. Be prepared to do some dancing with integers. Some progress has been made from the dark days, but you still need to turn it on.

YAML. Well, YAML is nice and all, but it is misleadingly simple. In reality, it is fairly complex and you can, for example, define a multiline string in a gazillion ways. Enough to warrant a dedicated website: yaml-multiline.info. Booleans can also be expressed in many ways, so you might see yes instead of true in some places. Both are syntactically correct, but you should stick to one for clarity. Just remember that Norway is false too, as NO is interpreted as a boolean. Remember to indent each line properly as well. The wrong amount of indentation leads to an error in most cases or misinterpretation in the rest.

Task execution is linear, a batch of hosts is executed concurrently for each task, but tasks follow each other in a strict order. This may be changed by choosing a different strategy, but again it’s a configuration option most likely left at the default and you need to know the implications of turning it on.

Ansible is also stateless by design, so compared to Terraform’s state handling, it needs to go and check each and every resource, each and every time it encounters a task you have created for managing the state of said resource. Getting a nice diff of the changes in one go isn’t possible, such as in Terraform or Helm. Combined with the default linear strategy, be prepared to wait for your 1000th task to make its change after the 999 have first been marked with an ok. There are some ways to limit this waiting time, but Ansible doesn’t have a way of knowing only the last task has a modification, and the rest don’t until it reaches the end.

There’s also no real dependencies between resources like in Puppet and Terraform. While you can get the output of tasks via register and use them as an input elsewhere and can notify other resources via handlers during changes. You can’t get a graph out of all the resources in your playbooks and roles.

Ansible works great when you have a limited number of administrators running the commands. All of whom are access control-wise on the “root access” side of the permission spectrum. It has no locking mechanism unless you build one, so nothing is stopping you from running conflicting changes from multiple places at the same time. Version control is highly recommended with the playbooks, but handling branches might need some work on the user’s part due to this. You may start to wish for a server like in Chef or Puppet when you reach a certain scale. For access control, centralized management, or when you have users with varying degrees of permissions.

One might also say that an agentless push architecture has its limits, but let’s agree that the other direction has issues as well, and they are suitable for different types of use cases. Amidst all the pros and cons, another question pops up: when should Ansible be selected as the tool of choice?

When should I select Ansible over the alternatives?

Ansible excels at managing Unix-y servers and infrastructure. Writing roles and managing the installation, configuration, and startup of services like Apache, Elasticsearch or MariaDB. It also has a variety of networking-related modules, so Ansible is at home in the datacenter. Places where the environment is a bit more static compared to the ephemeral nature of containers.

Ad hoc commands and easily approachable syntax make it a good tool for operators. Fast to get started with, the old scripts and thingamajigs can be converted into YAML with less work than learning how to write Ruby for Chef, Terraforming everything or moving to a new way of working with containers and Kubernetes.

Ansible has no UI, nor a server unless you count Ansible Tower from RedHat. Thus it serves command line -versed users better than people used to clicking things. Speaking of Tower, it lets you visualize and centralize Ansible usage. It also provides you with an API and access control. It also comes with a price tag. As an alternative, AWX is the upstream open source version of Tower.

While it’s possible to use Ansible to talk with cloud platform APIs for creating the servers and then using a dynamic inventory module to connect to them as hosts, Ansible is not at its best. At a larger scale, you may start to appreciate Terraform’s state handling, and it’s mapping of configuration to real-world resources.

When working with public clouds such as AWS, Azure or GCP, the consensus seems to be to go with Terraform for provisioning. Quite possibly due to the state handling reason listed above. But Terraform does not excel in managing the contents of the servers so perhaps perform your continued configuration management with Ansible. This does not work well with autoscaling and other orchestration actions, however. So a better option would be to create the images with Ansible and then shoot them to the cloud with Terraform. After that, make any autoscaling and failure recovery happen with the immutable images. On the operations side of things, Ansible will be handy due to the ad hoc commands and quick to write playbooks.

The cloud providers have their own ways of doing things as well. I’ve heard good things about Amazon’s CDK but have never actually used it. As it is actual code-code, rather than Ansible’s kind-of-code, it depends on your background and personal preferences as much as the capabilities of the software. And even with CDK, you end up with declarative CloudFormation in-between. Azure has its Resource Manager, and Google named theirs the Deployment Manager. But, the cloud-specific technologies are just that, cloud-specific. When selecting tooling, I see a lot of value in sticking to things that have a broader scope.

In OpenStack for instance, I’d personally go with Ansible over writing Heat templates when provisioning the basic resources. The tool is more generic, and I think it’s better being an Ansible expert rather than OpenStack Heat expert. The Heat templates tend to get messier over time and overly complicated with increased complexity as well. But there are some limitations to what the current OpenStack modules support, so the need for managing the newest OpenStack components might limit your choice of tooling. After the move to collections, the module development seems to be picking up though. Speaking of OpenStack, the OpenStack-Ansible is a nice project for deploying OpenStack itself.

Managing Windows servers is supported, but the Windows modules are kept separate from Linux variants. They are also far fewer in number. Moving from Ubuntu to RedHat is less painful, requiring fewer changes in the YAML than moving from Ubuntu to Windows. The Ansible controller also has to be Unix based, and while it’s theoretically possible to use WSL for this, it is not supported nor recommended for production use.

But the fact is, you are not limited to just one tool! Ansible works great combined with Terraform. You can use it to create some Kubernetes clusters either on-premises or in the cloud and then use Helm to manage the deployments themselves. You can plug Ansible into Kubernetes more deeply too, with the Operator SDK. You can use it to build container images as well as virtual machine images to achieve immutability in production environments. And, over the various alternatives, it really shines with managing networking equipment so you could plug it into NetBox.

I hope this serves as a good introduction to Ansible and has piqued your interest. Whether you are knowledgeable on it already or in the process of selecting your tools, take a look at the next part in this blog series for the basic concepts, some actual usage, and best practices.

Ansible Best Practices: Part 1

What’s Ansible?

What is it good at?

What are the downsides?

When should I select Ansible over the alternatives?

Written by Miika Kankare