Non-reusable roles

George Shuklin
OpsOps
Published in
5 min readNov 6, 2020

I’d like to write down my thoughts about ‘role’ abstraction in Ansible. There is a rather pushy opinion out there that ‘role is the way to reuse and share code’, but I’d like to refute it.

Reusable roles

I’ll start by retelling the statement I’m going to dispute.

Role is a autonomous piece of code, which, been a properly written, enabling free use in different playbooks and projects without further modifications. This leads to a form of code reuse called ‘reusable roles’. Role is developed as a form of ‘library’, with self-contained documentation, tests, dependencies on other roles, etc.

If you check the way Galaxy was designed, the Molecule inclination to role-testing, you’ll see, that there is a lot of spoken and unspoken pressure to write generic, universal roles, where a role is a complete solution to a problem for as many different systems and scenarios as possible. That includes simultaneous support for yum/apt, different distros, etc.

If we have a role to configure a database management system (f.e. postgress or mysql), it should support everything: adding users, managing databases, schemes, permissions, connectors, protocols, engines, clustering — whatever DBMS supports. And it should do it for whole range of operating systems, database versions and distribution methods (installing DBMS from distro-package, installing it from alternative repository by DBMS vendor, even compiling it if necessary). You plug`role: mydbms` in a play, and you have your MyDBMS running.

I’d like to argue that this is dystopian nightmare due to the lack of code isolation in Ansible.

Price of re-usability

There are few important properties of any Ansible code we need to discuss.

  1. There is no variable isolation in Ansible. If some role defined some variable to one value, other roles will see it. Depending on the place of definition, it can shadow other role’s variables or been shadowed by other roles. This creates a pathological coupling between unrelated pieces of code (or inventory). The more variables are in the code, the higher is the chance for collision. The same goes for handlers. An extreme naming hygiene is required to avoid collisions, and there are no mechanisms to prevent or detect those collisions.
  2. Every task take significant time, even skips are very slow (compare to python code). Tasklist with five tasks will outperform tasklist with 300 tasks by huge and very noticeable time. The fewer tasks is the better.
  3. Advanced calculations in Jinja are prone to errors. There is no mechanism for type safety. There is no testing framework for jinja code. That means, the more complex code is inside jinja expressions (including expressions for loop and when), the more brittle code become. Even if it works in some scenarios, a random spooky changes in unrelated variables may cause unpredicable outcomes of computation, and unexpected decisions may cause delayed damage, which complicates debugging.
  4. Type conversion between Jinja and Ansible is happens through hard-to-predict set of heuristics and almost always includes ‘stringization’ of an expression. That means a dictionary or a list may not be accepted as a dictionary or as a list but treated as a string.
  5. There is no unit-test frameworks (and I can’t see any possibility of implementing) for computational part of the code. Technically, every expression in ‘{{ }}’ is a function, but none of them could have unit tests. The single proposed way to check them is to use some kind of integration tests with running role inside of playbook. Integration tests are expensive and slow (especially, if dynamic provisioning is used). Practically, that means that most of Jinja code lacks good test coverage with missed edge cases and no checks for invariant.

All that leaded me to the strong believe that any role should be as small as possible. It should have as little computational decisions as possible (no complex jinja), and it should keep as much invariant as possible. The best role contains a simple plain tasklist with no (or almost no) branching (including when and loops over potentially empty lists), no true loops (include_* with loop). It should ‘think’ as little as possible, it should ‘do’ as little as possible.

Such role loses reusability. It supports only one or few distributions, only limited amount of options and, basically, is just a fancy way to store a piece of a playbook it’s been used in. In exchange it reduces code maintenance to the trivial level.

As consequence, such role looses special need for testing. If it’s just a ‘fancy piece of playbook’, it should be tested as a part of that playbook. That means, the playbook tests are totally covering everything role does (and if there are no branches, that’s 100% code coverage, isn’t it?).

Reusable code

Nevetheless, there is a strong desire for code reuse. If a company has few projects, there is a high chance for those projects to have a lot of common code.

Ansible started to work on this in a form of collections (whose promised to have playbooks in them). Nevertheless, it’s work in progress.

There is no commonly accepted and working way to do code reuse. Large projects (f.e. kubespray and ceph-ansible) are distributed as whole git repos with playbooks and modules in them.

In our company we are using git vendor to distribute pieces of code (of the same kind as ceph-ansible), and we’re calling them facilities. The interface between a facility and a project is defined by inventory requirements, the list of exposed variables is kept minimal, and facilities are tested extensively on all supported operating systems and configurations. They have the same problem as roles with variable sharing, but it can be partly alleviated by running vendored playbooks in a different ansible-playbook run, which gives at least a partial segregation. Inventory, host vars and group vars are still and issue, but at least other playbooks/roles no longer a worry.

Another reusability branch is ‘custom modules’. Often it’s better to write a module instead of a role or a playbook. Such module may utilize the whole power of pytest, and ansible-test provides a rudimentary framework for module integration testing. Moreover, every variable inside a module is strictly private and can not be messed up in unexpected way by a careless inventory/play change.

Afterword

I totally agree, that roles would have been a good way to share the code, if they had had a good isolation and places to write a code in a rich programming language. Unfortunately, the current way of ‘computing in Jinja’ and ‘share everything’ approach for role variables is not the way for code sharing.

--

--

George Shuklin
OpsOps

I work at Servers.com, most of my stories are about Ansible, Ceph, Python, Openstack and Linux. My hobby is Rust.