The challenging delayed reboot

Tomer Salakoff
Cloud Mind

--

After much searching the interwebs, to my utter dismay, for a baked solution of some sort on Stack Overflow or the such, I have come up with a way to schedule a delayed restart of a Linux machine running RedHat/CentOS flavors.

Lets start with the why:

There are some tools in our industry who’s focus is clicking on our behalf…

Where previously you would have an engineer sitting in front of a terminal running a set of scripts and commands, today you have automated systems managing these processes for us. One such system is Octopus Deploy which we use heavily to manage configuration and deployments across multiple mixed and distributed environments.

One of it’s many features is a Runbook. Called so after the good old manual tomb which engineers used in days gone by, and in many cases still today, usually on the way to automating said procedure. The secret is in the name, Runbook, a set of steps, clear and easy to follow, at a running pace, at 3am when support calls and alarm bells are ringing. Of course runbooks used to come in many shapes and sizes, some for just everyday procedures of creating servers or running rudimentary maintenance tasks. And this is where I found myself recently. The challenge of updating our Linux machines opened an opportunity to use Octopus Runbook to run the update procedure for us on any Linux machine in a list of given environments.

The schema of the runbook should look familiar to you:

  • Check that the servers are available and ready for update
  • Trigger the level of updates you want based on a selected level. You may only want security updates this time.
  • Check if a restart is required by the updates installed
  • If needed, restart
  • Check the hosts are back to functionalperformance

*Note: This is not best practice in all of its glory… it is a simple way to ensure that any and all security updates can be installed easily at any time. This does not take into account difference in timing of lifecycle stages between test and production and does not take into account promotion of specific updates through the lifecycle. To show best practices of system updates is not the point of this article.

We can start with the update command, and add yum-utils to the mix to help us decide when to restart. Something like this

#!/bin/bash
echo "Starting Linux Security Update procedure"
sudo yum install -y yum-utils
sudo yum update --security -y
echo "Updates completed..." # needs-rebooting is a funky little tool from yum-utils
# its output however does not provide a nice workable value
# so lets convert it to boolean
rebootRequired=$(needs-restarting -r)
if [[ $rebootRequired == *"Reboot is required"* ]]; then
rebootRequired=1
else
rebootRequired=0
fi
echo "RebootRequired: $rebootRequired"
# I can't be bothered at this point to loop over each service and restart it
# so each service that needs restarting is just another vote for a reboot
servicesToRestart=( $(needs-restarting -s) )
rebootForServices=${#servicesToRestart[@]}
echo "Number of services to restart: $rebootForServices"
if [ "$rebootForServices" -gt "0" ]; then
echo "Services: "
echo ${servicesToRestart[@]}
fi
rebootRequired=$(($rebootRequired + $rebootForServices))
# And the final tally is... *drum roll*
echo "Tallied votes for reboot: $rebootRequired"

Ok, so the votes are in and final say goes to the rebooters… what next? Should be easy… right?

if [ "$rebootRequired" -gt "0" ] ; then
set_octopusvariable "systemRestarted" "True"
sleep 5
shutdown -r
fi

However…

Looking into Octopus Deploy documentation you will find a little foot note that mentions the following… and I paraphrase… a step cannot request for the reboot of the server as this will cause the tentacle process to stop and the step will fail…

So, what to do? We could try schedule a reboot for a little later… say in like 10 seconds. Ok, nice. That should be easy. Something like this

shutdown -r $(date --date="10 sec")

You would expect for this to work, but more testing reveals that the issue remains and the command is unable to finish before the Octopus process is killed by the restart, therefore failing the run.

So we go one step further and actually schedule a pure run of the shutdown command for a later date 10 seconds in the future. We can do so using /usr/bin/at

time=$(date --date="10 sec" "+%Y%m%d%H%M.%S")
echo "sudo /sbin/shutdown -r" | sudo /usr/bin/at -t $time 2>&1
  • Fist we generate the time string for 10 seconds in the future. (Note the format required by /usr/bin/at -time)
  • Then we echo the required command for shutdown (don’t forget the sudo)
  • And pass it through the pipe to the timed at command

The resulting code:

#!/bin/bash
echo "Starting Linux Security Update procedure"
sudo yum install -y yum-utils
sudo yum update --security -y
echo "Updates completed..." rebootRequired=$(needs-restarting -r)
if [[ $rebootRequired == *"Reboot is required"* ]]; then
rebootRequired=1
else
rebootRequired=0
fi
echo "RebootRequired: $rebootRequired"
servicesToRestart=( $(needs-restarting -s) )
rebootForServices=${#servicesToRestart[@]}
echo "Number of services to restart: $rebootForServices"
if [ "$rebootForServices" -gt "0" ]; then
echo "Services: "
echo ${servicesToRestart[@]}
fi
rebootRequired=$(($rebootRequired + $rebootForServices))
echo "Tallied votes for reboot: $rebootRequired"
if [ "$rebootRequired" -gt "0" ] ; then
set_octopusvariable "systemRestarted" "True"
sleep 5
echo "restarting server in 30s..."
time=$(date --date="10 sec" "+%Y%m%d%H%M.%S")
echo "sudo /sbin/shutdown -r" | sudo /usr/bin/at -t $time 2>&1
fi

And the resulting output for success

Updates completed... 
RebootRequired: 1
Number of services to restart: 6
Services:
network.service
....
....
Tallied votes for reboot: 7
restarting server in 10s...
job 1 at Wed Apr 7 12:21:00 2021

This solution allowed me to trigger the reboot from Octopus Deploy while allowing the Octopus process to finish gracefully. I find myself wondering where else this could be useful

--

--