What’s new in BOSH — September Edition

The Deployment Lifecycle in Detail

As the PM of the BOSH OpenStack CPI I get to play around with BOSH a lot and have fun with its quirks and subtleties. The learning curve is quite interesting, as pointed out by smarter people than me before, so I’d like to compile a few things that hopefully can make your life easier.

This is the a post in a series. If you just found this, I recommend to also read some previous posts.

Deployment lifecycle hooks

There hasn’t been a major release of BOSH for quite some time now, so let’s look at a topic that impacts everyone deploying with BOSH: The deployment lifecycle contains some hooks to execute your own code. Some of them are more well-known than other, and all of them have their specialities.

If you want to follow along the examples I’m giving here, have a look at the lifecycle-demo boshrelease on github. We’ll look at the big picture of a deployment lifecycle, the individual scripts, and a few examples how those scripts are used productively.

The deployment lifecycle looks like this

$ bosh deploy
[if this is an update]
[for each VM that already exists] (according to `update` section)
[for each Job in parallel]
* drain
* monit stop
[end for]
[end for]
[end if]
[for each VM in deployment] (according to `update` section)
[for each Job in parallel]
* pre-start
* monit start
[wait for monit status == running]
* post-start
[end for]
[end for]
[wait for all VMs == 'running']
[for all VMs in parallel]
* post-deploy
[end for]
deploy finished

So let’s look at each hook individually. All of them need to sit in `/var/vcap/jobs/<job-name>/bin/` and they’re not allowed to have a file ending. So their names would be `drain`, `pre-start`, `post-start`, and `post-deploy`.

The drain hook

Drain is executed before calling monit stop on a Job, e.g. when a VM is updated or removed. That means, the job is still running when its drain script is run. A prominent example is a DEA or Diego rep evacuating all application instances running on it before shutting down.

As you can see in the above examples, drain scripts can run pretty long-running, asynchronous tasks. There is no time limit on how long it can take to drain. Therefore, the Director needs to check regularly, if the Job has finished draining. That means your script has to be idempotent: It shouldn’t fail if it is called more than once.

To indicate if the Job has finished draining or if it needs more time, drain script needs to conform to special return codes, as stated by the documentation on bosh.io:

  • If no error occurred, always end with exit code 0
  • If the script needs more time, print a positive number to stdout (and nothing else). The Director will wait for that many seconds before calling the script again

When all drain scripts for all Jobs on a VM have been executed successfully, the Jobs are stopped with monit stop.

The pre-start hook

Pre-start is executed while the Job is stopped. In a pre-start script you can e.g. do one-off things that need to be done before the Job starts up. Many of the small releases that make up cf-release contain a script to setup log directories and set some parameters with sysctl, see e.g. the nats-release.

The pre-start scripts run in parallel for all Jobs on a VM. When the scripts for all Jobs on a VM have been executed successfully, the Jobs are started with `monit start`.

The post-start hook

Post-start is executed after a Job has been started successfully by monit. In a post-start script you can do one-off things that need to be done after your Job has started. The grafana-boshrelease creates its datasources in a post-start script.

The post-start scripts run in parallel for all Jobs on a VM. When the post-start scripts for all Jobs on a VM have been executed successfully, the VM is considered to have started successfully.

The post-deploy hook

Post-deploy is executed after all Jobs on all VMs in a deployment have started successfully. Note that execution of `post-deploy` is hidden behind a feature flag, which is false by default. Set enable_post_deploy: true in your Director deployment manifest to enable it.

I don’t know of any prominent examples making use of a post-deploy script, but here is a thought: Why not use it as some automated way to test your deployment instead of running an errand afterwards? Another possible use might be an initialization that needs your entire deployment to be available. Please don’t use it to modify the installed packages after they’ve been copied to the VM, that’s what your packaging script is for.

If you have a release making good use of a post-deploy, I’d be interested to hear your use-case!

When all post-deploy scripts are finished, the deployment is considered successful.

FAQ

Additionally, here are some often-heard questions and answers when it comes to the lifecycle hooks:

Something went wrong, where do I find my Logs?

All logs are in `/var/vcap/sys/log/<job-name>/<script-name>.std[out|err].log`. They are owned by root, therefore you cannot simply get them using the regular BOSH commands (follow this CLI bug for updates on this). You’d have to get them either by doing an `bosh ssh` onto the machine or by executing a sudo command on the machine using `bosh ssh -c`.

I want my script to be executed exactly once for an Instance Group

Previously, many people used to check for spec.index == 0. However, index is in the meanwhile deprecated in favor of id. So the recommendation is to use <% if spec.bootstrap %> to have a script executed only once on the first instance that starts in a group.

I want my script to access some information about the Job

There are some job-specific properties available to all scripts, accessible via ERB. One example is <%= spec.name %> for a job’s name. See bosh.io for a list of all properties.

I need IP addresses of other jobs in my post-deployment script

If you’re e.g. doing some cluster initialization, which requires all nodes to be started already, you can do this in your post-deployment script. To access all other jobs over their IPs, you can use bosh-links and have your scripts do something like

<% master_node = link('master')%>
master_nodes = <%= master.instances.inject([]) do |ips, instance|
ips << instance.address
end %>