When Puppet’s away, the sysadmin will play — part 1

Rémi Ferrand
Sep 3, 2018 · 6 min read

Thanks, to Fabien Wernli for reading a draft of this post and offering some suggestions.

Author note: As usual, my posts are based on my personal experience in my day-to-day work life. I do not pretend to present the perfect solutions in my posts and I do not pretend to be always right. If you find something that’s missing or a mistake that I’ve made, I’ll be glad to fix my posts based on your feedback. Thanks

How to disable your Puppet Agent properly

Context

You connect to the node hosting your service <insert service name here> to try a simple manual modification for the next software upgrade.

You know that Puppet is running and managing the host and your service. You have some experience with Puppet and those one shot manual modifications, so the first thing you do is you stop the puppet daemon.

This morning, at the coffee break, you told your coworker Billy not to start Puppet on this very host. Perfect, you’re now ready to go and you can modify any service configuration file you want, nothing will waste your time and stand in your way.

OK, you’ve been working on your configuration for <insert huge time interval> and everything is working as you expect. Last thing you need to do is port those modifications to your Puppet code and you can go home. But before that, you really need a coffee and you really deserve it after all you’ve done.

You’ve left your desk for only <insert tiny time interval> and you’re back to your workstation with a good coffee, ready to write some Puppet code. You just check the modifications one last time to prepare your Puppet code.

Wait … WHAT ! Everything you’ve done in the last hours have reverted to what they were before you stopped Puppet. You just check and indeed, the Puppet Agent is running again on the host you were working on. How is that even possible !?

Ah-ha ! You can see that Billy, your coworker, is connected to this node.

“Hey Billy, did you restart Puppet on the host I was working on ? I told you not to this morning !” you ask

“Whoops, forgot that sorry ! The node wasn’t reporting to Puppet-DB for more than three hours and an alert was raised. Why ? Did I break something ?” your coworker answers

“You just destroyed my last three hours of work dude…” you said


Sure, the story above is a little exaggerated (and some would say that the client filebucket exists). But I’m sure almost everyone that has worked with Puppet Agent has lived a similar experience.

They are many reasons (good and bad ones) why you sometimes want to temporarily disable the Puppet Agent from running (troubleshooting, testing brand new configurations settings, …)

There are also many reasons why simply executing a systemd stop puppet on your system is not enough:

  • a coworker restarts the Puppet Agent (“Damn you Billy !”)
  • the monitoring system automatically restarts the Puppet Agent
  • the host reboots but you forgot to disablethe Puppet service. At startup, Puppet agent will start and erase all your modifications.
  • name your reason here

It would be way better if your Puppet Agent could be aware that no modifications should be made to your systems.

Puppetlabs’ solution to disable Puppet runs

Once again, Puppetlabs has done such an awesome work in the Puppet Agent.

The Agent includes this functionality right out of the box.

Like Puppet official documentation says:

Whether you’re troubleshooting errors, working in a maintenance window, or developing in a sandbox environment, you may need to temporarily disable the Puppet agent from running.

Disable the Agent

$ sudo puppet agent --disable "Rémi: working on the next killer feature manually"

A message is logged

Sep 01 17:45:40 dev-host puppet-agent[760]: Disabling Puppet.

Behind the seen, this command creates a file /var/lib/puppet/state/agent_disabled.lock (or some similar path depending on your configuration)

$ jq < /var/lib/puppet/state/agent_disabled.lock{
“disabled_message”: “Rémi: Working on the next killer feature”
}

When you (or someone / something) tries to run the Puppet Agent, here’s what shows up:

# puppet agent --test
Notice: Skipping run of Puppet configuration client; administratively disabled (Reason: 'Rémi: Working on the next killer feature');
Use 'puppet agent --enable' to re-enable.

Enable the Agent

$ sudo puppet agent --enable

and you’re good to go, the lock file is removed and the Agent will run as before.

Caveats

This feature is nice but does have a vicious behavior.

Let’s say you stop Puppet Agent but you have a 2pm deadline for the tests to be ready.

# puppet agent --disable “Puppet agent can be safely 
restarted at 2PM”
# puppet agent --test
Notice: Skipping run of Puppet configuration client; administratively disabled (Reason: 'Puppet agent can be restarted at 2PM');
Use 'puppet agent --enable' to re-enable.

As usual, at 2pm you realize that you need more time to work on your feature, 5pm will be enough time for you to finish. So you just extend the deadline and update the message you set earlier:

# puppet agent --disable "Puppet agent can be restarted at 5PM"# puppet agent --test
Notice: Skipping run of Puppet configuration client; administratively disabled (Reason: 'Puppet agent can be restarted at 2PM');
Use 'puppet agent --enable' to re-enable.

Wait ! … the message has not been updated.

Some would say that’s a bug, maybe that’s expected behavior.

What about the Puppet-DB facts / reports

When the Puppet Agent is disabled on a host, the Puppet-DB does not seem to hear about that node reportsor factsanymore.

A quick way to check that behavior is:

  1. List your node facts
$ puppet-query 'facts { certname = "dev-host.example.org" }' | wc -l
1481

2. Delete your node facts directly from Puppet-DB

$ psql -c "delete from factsets where certname='dev-host.example.org'"
$ psql -c "delete from catalogs where certname='dev-host.example.org'"

3. Check that the Puppet-DB does not hold any fact for your host

$ puppet-query 'facts { certname = "dev-host.example.org" }' | wc -l
0

4. Try to run the Puppet Agent on the node where the Agent has been disabled and check your Puppet-DB’s facts again

$ puppet-query 'facts { certname = "dev-host.example.org" }' | wc -l
0

This seems normal as the Puppet Agent does not run and just dies after printing out the message we’ve seen earlier.

This can however be a problem in some environments where hosts can’t just disappear from Puppet-DB for whatever reasons.

For instance, at IN2P3 Computing Centre, our CMDB relies on the Puppet-DB to display some information (facts, …) and this mechanism has already raised alerts if a host just disappear from the Puppet-DB.

This is why we had to develop our own solution that allows us to:

  • disable Puppet Agent run on a node
  • allow Puppet Agent run in white-listed Puppet environments
  • keep Puppet-DB data updated
  • list in a simple manner all the nodes where Puppet Agent has been disabled

I will discuss how we implemented this solution in a next post.

Digging into the Puppet Agent code

For those who are curious about where all that resides in the Puppet Agent code, you can find it here:

Conclusion

Temporarily disable the Puppet Agent is a very nice feature to have in your Puppet toolkit. It can save you some precious time in emergency situations when you just need to fix it anyway you can.

As stated above, we’re not using this Puppet Agent feature at IN2P3 Computing Centre. We have developed our own solution to achieve the same goal and keep our Puppet-DB data up-to-date.

This solution will be developed soon in another post so stay tuned for more !

Rémi Ferrand

Written by

DevOps / Puppet / Ansible / GoLang / Automation

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade