Small Ansible manifesto

George Shuklin
OpsOps
Published in
3 min readJan 18, 2024

This is my Reddit comment about using one role from another in Ansible. I wrote it and realized it’s more deep than people asked for (their specific question), so I’m re-posting it as a separate article.

I see attempts to chain roles again and again in Ansible code. I’d say it’s the most under-delivered feature of Ansible, with most shortcomings due to global namespace for variables, lack of proper scoping, and opportunistic typing for Jinja.

Some problems can be offset by doing very strict mental discipline:

  • Strict prefixing of variables for roles. E.g., not a docker_timeout, but foobar_docker_timeout, foo_docker_timeout. It leads to a proliferation of explicit variable passing into roles (e.g., you have something like ‘foo_docker_timeout: "{{ docker_timeout }}"’ for every next foo role in every play. With 10+ parameters, it quickly becomes an maintainable boilerplate swamp (Which means you can maintain it, but it’s a lot of typing for simple problems).
  • Strict handler handling (e.g., calling handlers by role prefix (you can specify role prefix when calling a handler to avoid handler collapse between roles).
  • Extreme care should be taken with reasoning about precedence and placement of variables. One wrong parameter and you get unexpected shadowing in an unrelated place.

Other problems just don’t have a nice solution:

  • If role 1 is dependent on side effects of role 2, there is no clear place for handler flushing. It’s not role 1’s problem (it did what it wanted and provided handlers), but it’s not role 2’s problem (It’s expecting for service from role 1 to be up and running), and it’s not the play’s problem (it should not inspect role internals).
  • If role 1 privately uses role 2, which uses role 3, role 2 can still be influenced by non-default values configured by role 1, and it’s unexpected for role 2. One verbose and hard solution is not to have any defaults untouched when calling role 3 from role 2, but it’s defensive programming and it reduces productivity by a lot.
  • Multiple uses of the same role are still causing handlers to merge, which can be fatal if joined with ‘whens’. Consider this: role 1 imports role 2, role 3 imports role 2 conditionally with ‘when’, and that condition is false. The handler from role 1 is now overwritten with ‘when: false’ due to role 2, and role 1 got results of role 2 not executed because of the decisions in role 3.

Those small dents are compounding and the larger the codebase is, the harder it becomes to debug. Therefore, one of the sane approaches is to keep code simple, and avoid code reuse between roles.

  • You have independent roles with contracts between roles bounded by side effects only (no var/handler interaction at all).
  • The play is joining roles together and is responsible for flushing if needed (usually it’s just splitting the play in two, no need for explicit flushes).
  • The play is passing role parameters and dealing with inventories (e.g., there are contracts between play and role, between play and inventory).
  • For simplicity, it’s allowed to have a contract between inventory and role (e.g., to tune stuff which the play does not know about).
  • You never look at groups in roles (this is an work area for plays).
  • You never set_fact role variables outside of that role or play.

Ansible is limited, and knowing the limits of it as a programming tool is essential to get the best from Ansible. If you push Ansible in areas of normal programming, you get the worst of it, and it drags everything down.

Tl;dr; DO NOT PROGRAM WITH ANSIBLE. DO NOT MAKE DECISIONS WITH ANSIBLE.

Use Ansible to make side effects in accordance with play and inventory.

--

--

George Shuklin
OpsOps

I work at Servers.com, most of my stories are about Ansible, Ceph, Python, Openstack and Linux. My hobby is Rust.