Build Scalable Webhooks with a Queue and Workers Setup

Get asynchronous with webhooks + RabbitMQ

--

Webhooks are an excellent way of moving data between applications, but if added without consideration for scaling they can easily become a performance problem. In a previous article, we saw a Guestbook application which, instead of sending webhooks synchronously, added them to a queue for later processing. Today we’ll look at how that processing actually works.

The basic setup is that there is some data that we want to send to a webhook endpoint. In this case, the data is a username, a comment, and a timestamp, but this data could be anything you want. The endpoint could be a listening chatbot, a webhook to trigger our CI service, or one of our own applications that is designed to react to the data. A message is sent and stored on a message queue, containing the data and an array of webhooks. In this example, the data is JSON-encoded to make it easy to store and parse, and the example queue is RabbitMQ, although the approach here would work with any queue.

Divide and Conquer

We know what data we’re expecting (comment, plus an array of webhooks), and the next step is to plan how to write a “worker” — a script that will know how to take that data and process it. In this case, the process looks something like:

  • for each webhook URL, send a POST request to that endpoint

If there are a lot of URLs to notify for a particular message, it can take time to process. If a problem occurs partway through the script, it could be tricky to know exactly what has or hasn’t already been processed. To tackle this problem, we’ll have one worker that simply splits up the work into many individual messages: one message per URL that needs requesting. The setup will look something like this:

This “comment worker” will process the messages from the comments queue, and for each URL in the webhooks array, it will create a new message on the “notifications” queue containing one URL and the data to send to it.

One Worker Consumes and Creates Messages

The first worker doesn’t send webhooks at all; it simply processes the data and prepares it for another worker to quickly send the hook and check the response. Here’s the code that does it:

Let’s walk through what’s happening here: first we grab some requirements and set up our config. My app lives either on localhost (a local development VM), or it’s deployed to Bluemix. So the code here grabs the config I need to connect to the guestbook-messages RabbitMQ where the messages are. On line 17, we set up the connection, connect to the "comments" queue that we want to consume, and output a message to the console.

Things get interesting on line 23, when we consume our first message. We connect to a second queue (connecting also creates a queue if one doesn't exist already), and then work through the webhooks collection in the data to create the new messages, adding them to this second "notifications" queue. For example, if the first message on the “comments” queue contains this data:

Then this worker will create two new messages on the “notifications” queue:

The URLs in these examples are RequestBin examples. RequestBin lets you create a unique endpoint that you can send any data to, and then visit it in a web browser to inspect what was sent. It’s very handy when debugging! In this case, it gives me an easy way to see what my script is sending.

Another Worker Handles the Webhooks

The second worker script in this project does the actual webhook sending. Let’s dive straight into the code:

The overall shape of this worker script is pretty similar to the previous one: picking up environment variables and connecting to RabbitMQ. Jump to line 23, where we consume the message. The worker parses the JSON data from the message string, and then simply re-JSONs the comment part. Then we use the request library to create a request using the URL from the message and add the comment data as the body of the request. Then, it sends the message.

Checking one of the RequestBin endpoints we created earlier, here’s what I see:

The request arrived safely, and right at the bottom of the screenshot you can see the data that arrived with it: our webhook!

Webhooks And Queues

Using webhooks in your own applications can be a good way of distributing load and maintaining responsive UIs by processing some actions asynchronously. In my example, I deploy just 1 or 2 commentWorker scripts to handle splitting the data into per-request messages, but I deploy more of the notificationWorker scripts so that even if a message arrives with a large array of URLs, it can be serviced quickly by several workers sharing the load.

All the code in this project is available on GitHub: https://github.com/ibm-cds-labs/guestbook. You’ll find the worker code from this post in src/workers. Check the README for deployment instructions.

As always, I'm interested to hear if you find this post useful, or if you have other advice or experiences to share. Let me know in the comments!

--

--

Lorna Mitchell
Center for Open Source Data and AI Technologies

Polyglot programmer, technology addict, open source fanatic and incurable blogger (see http://lornajane.net)