Do You Need Distributed Cron? Yes. Yes You Do.

Photo by Teresa Kluge on Unsplash

Thanks to Moore’s law slowing down, the rise of public cloud, and other things, many of our major systems run distributed. It’s hard to run single instance highly available applications in the cloud as cloud native apps are meant to be reliable systems made from unreliable parts.

But what do you do when you need to run a cron job but you don’t want to rely on setting that cron job up on just one, single, “make it be up all the time nine9s” instance? There are probably two main options:

  1. Just run more cron jobs, ie. setup the same cron job on many servers (maybe three) and just stagger them somehow. At Vurt we have seen systems that have a job that is setup to run randomly once per day.
  2. Use a highly available distributed system that can ensure a particular job runs once when it’s supposed to, some sort of distributed cron.

We like option #2, but #1, if it meets your needs, would be simpler.

Enter Hashicorp Nomad and Periodic Jobs

Hashicorp makes all kinds of useful opensource applications, from Vagrant to Vault to Nomad.

Nomad is a job execution system. Hashicorp uses the word scheduler.

Nomad is a tool for managing a cluster of machines and running applications on them. Nomad abstracts away machines and the location of applications, and instead enables users to declare what they want to run and Nomad handles where they should run and how to run them.

Nomad can do a lot of things. But one of the things we at Vurt find most interesting is “periodic jobs.”

The configuration for a periodic job even looks like a cron job. The example from the Nomad docs shown below says run the job “docs” every 15 minutes, and don’t allow the job to overlap.

job "docs" {
periodic {
cron = "*/15 * * * * *"
prohibit_overlap = true
}
}

High Availability

From a simplistic viewpoint, Nomad has a core set of servers, usually three or more, and then any number of client nodes which can have their own specific resources.

When a job is specified constraints can be set to ensure that a particular job runs on nodes that meet specific constraints. For example if we had created three MySQL backup Nomad client nodes (just containers), which have a script deployed to them that can backup your MySQL server(s) to a remote location, then a Nomad job can be setup to execute that script on one of those nodes at a particular time of the day, or hourly or whatever.

If one of those client nodes becomes unavailable, the job will simply run on one that is. So if you have three nodes and lose one, your job will still run on one of the remaining nodes. And thus we have a cloud native cron system.

Vurt Can Help

At Vurt we have experience running Nomad as a distributed cron system. Certainly there are other solutions (such as OpenStack Mistral), and we can help with those too, but we are fans of Hashicorp and, of course, Nomad. If your applications and systems would benefit from a highly available distributed cron system, please email us a vurt@vurtcloud.com or via our contact page.

Show your support

Clapping shows how much you appreciated Vurt Cloud’s story.