A Simple Pattern for Jobs and Crons on AWS

I dunno maybe don’t use a big complicated self-managed thing for this

Dan McKinley
Skyliner
4 min readMar 28, 2017

--

Web platforms are oriented around requests. But every nontrivial web application needs a way to run code out of band. Maybe you need to process uploads, or maybe you need to send a membership newsletter on a schedule.

The asynchrony problem is more open-ended than serving requests. You could set up a contraption of unbounded complexity for this, and believe me, people do. I have seen many iterations of this in my time, each terrifying in its own way.

When we set out to build Skyliner, we tried to go relatively low-fi and native on this.

We should stick to message passing patterns that are simple to debug and easy to scale. And we should deny ourselves features that put those attributes at risk. The implementation I’ll explain today is functional, Spartan, and horizontally scalable.

Things we won’t be doing

Let me derail the post long enough to call out some important things that we are not trying to achieve.

Running jobs exactly once

The reason we aren’t trying to do this is because it is impossible. Jobs will need to be safe for the theoretically-unavoidable repeat invocations.

Gracefully shutting down workers

This is another thing we can’t guarantee, and therefore we will not try to do it at all.

Making special worker/cron boxes

We’re running our code on virtual machines, which can be destroyed at any moment. We’ll embrace this and make all of our application servers pull doubly-duty as task workers. This is better than relying on a special cron node, which will work fine until it doesn’t.

Locking and throttling

You’ll need to ensure that your rate of new jobs and crons does not exceed the rate at which your system can complete the work. We won’t be tackling this today. But our solution is distributed, which will make it harder for someone to dig a concealed pit and fill it with lockfiles.

Synchronous remote jobs, automatic parameter serialization, etc.

These are dubious features that will most likely bite you.

Just use the tools AWS gives you

Amazon comes with a highly-available, scalable, reliable, managed message queue service: SQS. We’ll use this to communicate with our workers. We need to know a few basic things about how SQS works:

  • Messages are kept on the queue until a worker deletes them.
  • After being read by a worker, messages are invisible to other workers until a visibility timeout has elapsed.

So we can use those features to ensure that jobs are handled at least once. Our workers will read messages, handle them, and then delete them. If they crash in the middle of doing this, another worker will automatically retry.

Now let’s come up with a dumb little JSON wire format for invoking jobs:

Just glue your workers to your webs

I think we should have a few design goals with respect to writing and delivering the code that will run in these workers.

  • We should share code between web requests and jobs.
  • Since we’re sharing code, deploying the website should also just update all of the workers.

The simplest way to achieve this (if your language and runtime will cooperate with it) is to just implement workers as one or more daemon threads in the webserver process.

If that’s unworkable, then you can write a simple consumer program that loads your webserver code, and have supervisord run it inside of the same Docker container as your web app.

A pretty boring architecture diagram of what we’re going to build here.

Write a simple SQS consumer

This is a workable job server:

This just dispatches to a function in its own module if it can find one with the passed method name.

Put ad-hoc jobs on the queue

Scheduling a job from our webserver is easy:

Just glue your crons to your workers

Ok, so we now have a working job server in less than three dozen lines of code. Now how do we run crons? Easy — CloudWatch Events allow us to add messages to our queue using a cron expression:

Configuring a CloudWatch Event to send a message to our job queue on a cron schedule.

We can configure these as needed, and write job handlers that will be called like crons.

That’s it!

You probably want to add a bit more logging, and you definitely want to monitor the length of the queue. But this is pretty functional, and Amazon is managing all of the really hard parts for us. So we can avoid hiring more ops employees for a while longer.

This approach works great with Skyliner, a continuous delivery platform for AWS that is free for personal use. This is, incidentally, how we implement jobs and crons ourselves.

For a limited time, we’ll help you switch to Skyliner from Heroku for free.

--

--