How to Build a Webhook Delivery System

The Startup CTO
The Startup
Published in
6 min readJan 22, 2021

Webhooks are a critical component of our operations automation platform at LogicLoop. Having built the webhook delivery system at both LogicLoop and Stripe, our team has plenty of learnings to share. This article is for developers who have been tasked with building a webhook producer/dispatcher. If you’re working on an API product or B2B SaaS, chances are you have received requests from customers to provide webhook events they can act on.

Sending webhooks can be harder than you think…

At this point, you’re probably wondering how hard can it be because isn’t it just a matter of sending out a couple of HTTP POST requests? Our team knows that it ends up taking more engineering time and resources than you may think at first blush. Let us take you through the journey so you don’t have to relearn the some of the lessons that we did…

How to send webhook events

First, you will need to design your events schema and think about what type of events and what data you want to deliver. Do you want to emit an event every time an update occurs in your system or only for certain unique events your customers need to be notified of in particular?

A simple set up

Once you have an event schema designed, it’s time to handle the delivery side of things. Simple webhook delivery systems do in fact start with a single, in-line HTTP POST request to a designated endpoint. You might also create a database table for all endpoints that are associated with various customers and another database table for all of your webhook events and logs. This is usually sufficient for side projects.

Asynchronous queues are the way to go

Once customers start using your system, you will quickly realize that embedding multiple external API calls directly into your code is bad practice, especially when the behavior of these external endpoints are so variable — some consumers follow best practices and return a 200 response quickly while others can hang for a long time. At this point, you will want to switch to an event based queueing system. When an event fires, you push it into the queue and an asynchronous worker will later take the message off of the queue and process it in the background. This will ensure that your main code pathways remain independent and do not get blocked on slow third party endpoints.

For something lightweight, you can look into Sidekiq if you’re using Ruby on Rails. For something more robust and heavyweight, you can consider RabbitMQ, Kafka, or Amazon SQS. You will want to consider which infrastructural option your team has the most experience operating.

Retries and naughty endpoints

Next, you will quickly realize that your customer’s endpoints actually fail more often than you’d like (“temporary downtime”) and you will get requests to manually fix or retry webhook events that failed. To save yourself the operations toil, you can build retries directly into your queueing logic so that events that fail can automatically be scheduled for a series of retries. You can either come up with a simple system like: we will retry failed events once per day for 3 days, or incorporate a more sophisticated exponential backoff schedule. Some customers might even want to be able to configure their own retry schedules. The most important part is making sure you are transparent with your customers about your retry policy.

There will also be endpoints that fail indefinitely. Usually this will be because your consumers misconfigured their endpoints or had an endpoint that they later deleted. There is no point spending your computing resources continuously retrying these naughty endpoints so you will want to set some threshold where you completely disable these endpoints and/or notify your consumers that the endpoints have failed.

UI Portal for configuration

At this point, your customers will also likely want to have a Dashboard where they can:

  • Add, edit, and delete endpoints
  • Configure which events they want to receive for which endpoints
  • Specify their preferences for timeouts and retries
  • View a log of all the events delivered both successfully and unsuccessfully
A sample UI configuration Dashboard for Stripe’s webhooks

Security

Ah yes, security. While standard APIs enforce security by forcing requests to include a secret API key, enforcing security for webhooks is more challenging because you are sending data out to unverified 3rd party endpoints rather than receiving requests inbound. Here are some strategies to enforce security for webhooks:

  • Use webhook signatures — sign webhooks with a secret key by including an extra signature field in your headers so your customers can verify that the data is indeed coming from you and not an impersonator. This can be a simple plain text shared secret, or an HMAC-based encoded signature for greater security.
  • Webhook signatures itself don’t protect against replay attacks where a hacker intercepts a valid request and resends it. To protect against this, you can include a timestamp as part of the signature so an attacker can’t alter the timestamp without nullifying the signature.
  • Provide IP whitelisting — give your customers a list of valid IP address your events will come from.
  • Support https endpoints only
  • Do not send sensitive data in your webhook. Instead of sending your customers all the data they you to know, you can send them just the event type and an abridged version of the payload. Your customers can then query your API to retrieve the rest of the data.
  • Encrypt the data that you send.

Based on the nature of your business, your team will have different levels of paranoia when it comes to webhook security. It’s up to you to decide where in the tradeoff between developer experience and air tight security you would like to land.

Testing webhooks

Another common customer request is users would like to test out your webhook events before moving forward. You will want to provide them with a way to trigger a test event in your system.

Some users will want your webhook to hit their unpublished, locally running and experimental endpoints but unfortunately sending a POST request to a locally running endpoint is not the most straightforward. Using a service like ngrok will help them.

Otherwise, your users will just have to test with their public servers, or can use a service like Postman or Request Bin.

First class webhook providers like Stripe have built the ability to test locally directly into their CLI.

Wait there’s more…

Ok, we’ve taken you through the major considerations but we would be remiss not to mention a few others:

  • Avoiding duplicates — certain events may trigger actual downstream consequences so you don’t want to send an event like a customer’s invoice being paid twice. One way to protect against this is by providing a unique idempotency key so if your consumer sees the same key twice, they will know to ignore it as a duplicate.
  • In order delivery — unfortunately the nature of webhooks makes it hard to guarantee in order delivery. For example, let’s say an invoice was created, triggering an invoice.created event and one second later the invoice was paid, triggering an invoice.paid event. Now let’s say the invoice.created event failed and as re-queued for retry 1 hour later but the invoice.paid event delivered successfully immediately. As a result, consumers must build their system to be tolerated of these out of order executions.
  • Don’t DDOS your customers — if there’s a lot of activity on your customer’s platforms, you might end up generating a ton of events for them. Not all company’s endpoints are set up to be able to handle a rapid high volume of requests so you will want to enforce your own outgoing rate limit to prevent DDOSing them.

“We once DDOS-ed a popular ride share service on new year’s day by inundating them with a ton of webhook events due to increased activity on their platform. Oops.” — Engineer at Twilio

  • Internal monitoring — make sure you have the right logs and analytics around SLAs, delivery failures, retries, event volumes, etc. so you’re ready to act when an anomaly occurs in your system or you need to investigate an incident.
  • Documentation — finally, make sure you have the proper documentation for your events as your customers will build systems that rely on them and unexpected changes to your event schema can actually causes incidents in their systems.

A good developer experience around webhooks will make your customer’s lives easier and help them champion your product. For years, customers have chosen to integrate with Stripe over other payment processors simply due to how much thought Stripe puts into various aspects of its developer experience like webhooks.

If you’re considering building your own webhook system, we hope this was helpful. If you’d like to see some webhooks in action, check out LogicLoop and trigger some webhooks to alert upon and automate your business data.

--

--

The Startup
The Startup

Published in The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +772K followers.

The Startup CTO
The Startup CTO

Written by The Startup CTO

👋 Hi, I’m Cofounder & CTO of www.logicloop.com — trigger alerts and automations on top of your data. Follow along for all things startups and engineering.

Responses (2)