Ansible role is not ‘a fancy play’

A decoupling approach to role design

Published in

OpsOps

5 min readNov 25, 2019

There is a widely assumed idea in Ansible circles, that using roles is a sign of a better code. If you write everything in roles, it’s like you are writing a good libraries, reusable and with a good isolation. Writing roles is good, and plays are here just to list roles to execute. Plays with tasks are considered lesser being, transient and unworthy. I exaggerate a bit, but the idea is this: if you do something solid and long-lasting, use roles. It is supported by notion of ‘reusable roles’ in Ansible docs, and overall obsession with role-specific behavior.

I start to doubt this idea. Is it always so, and, may be, roles aren’t that good in some situations?

What is ‘good’? For the sake of this article, I look at amount of coupling in the system. Coupling is bad, so less coupling, is good. Coupling in the context of Ansible is a logical (behavioral) link between different pieces of code. Some coupling is inevitable; some can be avoided.

Before getting into details, look at this example of ‘bad’ because of the coupling. Let’s say you have two independent (unrelated) roles. One role configures a cron job for a daily backup, and there is another role to install haproxy, which relies on the first (cronjob) role to create ‘/etc/ssl/private/foobar.key’.

(dramatic pause)
Why?? Cronjob? SSL key? For haproxy? Please, no.

It’s an artificial example, but it should illustrate why coupling is bad. If cronjob ever would decide to change a private key location, it would unexpectdly breaks haproxy. An unwanted coupling of epic scale is at your dis-service.

That was an example of unwanted and bad coupling. Now, let’s look at roles in comparison with plays to find the sources of bad coupling there.

What’s the key difference between role and play?

The question sounds silly, bit focus here is on the word ‘key’. What is the most important difference between a role and a play? The answer is a host list, the hosts directive of the play. A role can not have it, and any attempt to have it (f.e. delegate_to with with_items) is totally inferior to a play. It does not have a proper parallelization, batches (serial: n), error handling, --limit option, etc.

A play have deep integration with hosts and groups, whilst a role have fancy variables layering (someone can say ugly and complex).

Choosing role VS play for task

I argue that, in general, it’s a malpractice (the opposite for ‘best practice’) to make a role to have coupling to hosts and groups.

Examples are:

gathering IPs of hosts to write a iptables/ACL rule
Creating something on all hosts in a specific group because of the role neeed
Assessing if host is in specific group, or checking for existence of groups/hosts.
Accessing hostvars of other host based on a specific group membership or a specific (‘hardcoded’) host name

I understand, that my claim is big, and it contradicts some well-established big roles out there, but I have a desire for clarity and maintainability of my work. It forces me to pursue this thought and disregard the existing practice if needed.

No inter-host operations in roles

Every time a role performing any kind of operation on groups/hosts (I gave an example in the previous chapter), this role become hardwired on a specific host configuration and/or a specific execution flow. Here are some specific cases for this coupling:

Relying on presence of the specific group in the inventory. Any role with such requirement is not reusable in a true meaning of ‘reusable’ (plug-n-go in the next project).
Requiring some specific number of hosts in a group. It’s often written as hostvars[groups.db[2]].ansible_default_ipv4.address. What if we have no two hosts in the db group? This argument valid even if the group name is not hard-coded.
Execution of code on hosts in some group or on a specific hostname (except for localhost). It’s a specific case for more generic ‘presence of host in the inventory’. As usual, it creates coupling between inventory and role, moreover, it breaks some Ansible use-cases (like use of --limit).
Expectation to have a specific variables be available for some (other) host.
Binding of the role to a specific group. Role should work as expected if it was reassigned from one group into another (it’s the same as renaming of the group).

As you can see all those cases points to the same issue: expectation on inventory to have a specific groups, specific number of hosts in the group, or having a specific variable for other host. Those kind of coupling make roles hard to move and reuse.

Additionally, and more importantly, it make hard to change other code outside of that role. We don’t know what kind of changes we are permitted to do with the inventory without analyzing the role code (in reality this translated to ‘analyzing the code of all roles in the project’). The more such roles are in the project, the harder to do this analysis. The Mother of Nightmares starts when you have few of such roles and you decide to change on of them, and those changes are contradict expectations of other roles. Roles usually have more code than playbooks, and they have many more considerations, so debugging and refactoring becoming much harder.

The solution?

There are two partial solution to this problem. One aims to move interaction from a role into playbook, another help to cope with certain cases when this interaction is inevitable.

The first solution is simple: if you have some code which uses hosts in different groups, use different plays. Each play have own ‘hosts’ statement, so interaction with inventory very straightforward, clean and standardized (the key factors for quick and painless refactoring/updating). Using different ‘hosts’ for different plays allows to use ‘native’ variables (without involving hostvars).

If only I could say ‘use plays instead of hosts’. There are some cases when roles need to work with few hosts. Any hypervisor delegation, or ‘cluster API’ is a good counter-example.

It those situations, when role-host interaction is justified, there is a simple trick to make things easier: use variables to pass hostnames/groups into roles. I would dare to say, ‘never use a group name, pass a host list from a group’.

The idea is simple: every time you need a delegation, pass a variable with a name of host you wan to delegate task to. For example, if you need to ask API server on some data, you can pass variable like ‘api_access_host’ with the name of the host to use.

If you need some data from many hosts, ask those hosts in a separate play and pass those data into the role as variables (with references to ‘hostvars’).

The single exception here is localhost. It’s always there, and you can relatively safely delegate to it or use it as a project-wide storage of facts.

Afterword

Everything said here is just a preliminary idea. I need to adopt this idea in my projects for some time before deciding if it’s viable or not.