We have few alerts for the most critical services, which explicitly checks if service is not failed. This require us to use systemd collector for node exporter, and it gathers a lot of series.
I just spend few days debugging absolutely simple problem.
My servers wasn’t able to communicate after applying nftables rules with policy deny.
The rule was very simple:
iifname eth0 ip6 saddr @good ip6 daddr…
ChatGPT start to scare me (in a good sense). It really knows git, and it knows it better than me. This solution is absolutely from it, I had had no idea that there is a fast way.
This is my Reddit comment about using one role from another in Ansible. I wrote it and realized it’s more deep than people asked for (their specific question), so I’m re-posting it as a separate article.
I’ve missed this completely. Linux is exposing oom_kill value in the /proc/vmstat. It’s there since 2.6.36.
oom_kill
/proc/vmstat
It is read by node exporter as node_vmstat_oom_kill metric.
node_vmstat_oom_kill
You may think I’m rehearsing a Unix textbook from 1990s. Oh, you can run programs in parallel. How nice! What an amazing UNIX you have here!
Nope. This is a real problem I was struggling to solve.
Why do you need this? Because you have something different managing IP address on it. For example, pacemaker.
If you run Ansible via just ( as I do) you may want to see notification about completion at the end of the long playbook.
just
I wrote an article about how to do it for plain Ansible run
Today I’m investigating an odd circular dependency in systemd units in openstack-ansible. One of the units under investigation implements a really interesting way to run database checks. I decided to dig deeper to better understand that technique.