handling handlers for Ansible

George Shuklin
4 min readNov 21, 2017

Theory: handlers are independent and idempotent. They may be called at any time after configuration was done in any order.

Reality: order matters. A lot. If you restart one service before other, it may cause other services to fail. Or it will fail itself, as it depends on other services (which yet are not restarted). Moreover, if restart failed, you want to retry this restart on the next Ansible run even there is nothing to change in configuration.

How to solve handles order problem (a simple way)

Ansible run handlers in their definition order. That means, if you have a single play, you can define order in the ‘handlers’ part of the play. If you use include_role with multiple handler definition, it quickly becomes extremely complex and brittle and I advice you not to rely on handlers order in this case.

Another possible solution is to split each ‘dependent’ part into separate plays. Plays order is well-defined and if you can do this, it will work as expected. Downside of this is slightly extended execution time (because of double fact gathering).

In many cases it’s impossible to split services. Moreover, you want to be sure that handlers are executed at the precise moment in the precise order, or, may be, even with certain conditions, retries, and serialization.

By the way, Ansible handlers lack serialization feature. If you want to restart your cluster with ‘one service at a time’, you should not use handlers for this, or you will cause your cluster some tough moments.

How to do handlers right

(it’s my personal novel idea, so the word ‘right’ is very opinionated here).

We will use declarative syntax for handlers.

---
- hosts: test
serial: 1
tasks:
- name: 'do something'
debug: msg='important job'
notify:
- restart my service
- name: flush hanlers
meta: flush_handlers
- name: Restart my service
service: name=myservice state=restarted
register: restart_status
until: restart_status|success
retries: 30
delay: 5
when: myservice_restart is defined
handlers:
- name: restart my service
set_fact:
myservice_restart: True

In handlers we do not restart anything. We just register that we want to restart service. Using handlers will save us a lot of time, compare to ‘register: service_restart_pending’ in every action. Handlers may also listen for different notifications, some handlers may change variables, etc.

After we done all work on service, we forcefully flush hanlders (meta: flush_handlers). At this moment we collapse all notification into solid actions. In our case it’s just ‘set_fact’. After that we check if restart is pending (when).

And now we can handle restart properly:

  1. We use serial: 1 to assure that we restart only one service on one server at a time.
  2. We will not restart instance on next server if previous one failed to restart (until restart_status|success). (yay! clusters are happy!)
  3. We will try to restart service few times. If we have a flaky service which may not start at first try, we will try up to 30 times before give up (retry), and we will wait for 5 seconds between tries (delay).

Surviving failures

Previous example was great, but it has a problem. If we have our playbook finish all configuration and then fail at handlers, some services wouldn’t be restarted. And if we run our playbook again, it will say ‘changes=0’ and our cluster will be left with some services in an old state.

To allow our play to ‘continue’ service restarts we need to change the way we keep information. I use file flags for this. Moreover, we will trim away ‘flush_hanlers’ by using post_tasks. It’s not always possible (that’s why first example contains flushes).

---
- hosts: test
serial: 1
tasks:
- name: 'do something'
debug: msg='important job'
notify:
- restart my service
post_tasks:
- name: check if restart pending
stat: path=/var/run/myservice_restart.pending
register: myservice_restart
- name: Restart my service
block:
- service: name=myservice state=restarted
register: restart_status
until: restart_status|success
retries: 30
delay: 5
- file: path=/var/run/myservice_restart.pending state=absent
when: myservice_restart.stats.exists
handlers:
- name: restart my service
file: path=/var/run/myservice_restart.pending state=touch

notes:

  1. We avoided flush_handlers by using post_taks. All ‘tasks’ handlers are flushed at the end of ‘task' section.
  2. We use ‘block’ to combine together restart and removal of the flag. If service restart fails, flag wouldn’t be removed and we can repeat our try. Moreover, we have a single ‘when’ for both ‘service’ and ‘file’ (so if no restart is pending, file action will be skipped).
  3. Because we use file flag for restart, we need to stat it. So we check '.stats.exists'.
  4. We create flag in /var/run directory. This directory is very special — it is emptied on server restart. Which is exactly what we want: if server has been restarted, our service was restarted too, so why would we want to restart it again?

Same trick can be applied to reload. Moreover, we can run a proper confcheck before attempting to restart, and that check will be serialized in the restart block as well.

Conclusion

If your handlers aren’t idempotent and independent, you may want to move all real machinery away from handlers into you tasks (post_tasks) of you play, leaving to handlers only job to create some idempotent and independent and semi-persist flag. That flag then will be checked at a proper moment in the main play, where there is much more ways to do things right in right order with right serialization (retries, delays, guards, you name it).

--

--

George Shuklin

I work at Servers.com, most of my stories are about Ansible, Ceph, Python, Openstack and Linux. My hobby is Rust.