How (and Why!) to Build Killer Bulk APIs — Part 1
Looking into boosting your application with bulk APIs? Working on a feature that requires bulk actions and can’t find any standards or best practices? You’re in the right place!
This is the first of two posts; in this Part 1 we will cover fundamental bulk API concepts and design approaches, while Part 2 will focus on architecture and implementation.
In this post, we will:
- Run through what Bulk APIs are
- See how they can bring value to you as an API provider and to your clients consuming it
- Explore some design approaches while giving concrete examples of how some of the biggest API providers are doing it
Whether you’re taking your first steps in writing APIs or you’re an experienced API guru writing REST controllers in your sleep, whether you’re a developer, an architect, a tech lead or any other R&D stakeholder (even a product manager!) looking into adding bulk capabilities to your product, this post (and the one that follows) will introduce you to some useful bulk API design tips and best practices we learned at CyberArk. These will help you enhance and scale up your application’s API layer, regardless of your technology stack.
What is a Bulk API?
Just so we’re aligned, the basic idea of a bulk API is applying multiple operations using a single HTTP request/response round-trip. This concept is sometimes referred to as ‘batch’ (more on that later). In this post, we’ll be focusing on write-oriented operations (writing, updating, deleting) in the context of bulk, since read operations are usually supported out-of-the-box.
Got It… Why Do I Need It?
Bulk APIs can be of value in several use-cases:
- Bulk UI operations: sometimes your application’s UI displays a grid with multiple items being modified at the same time. While this list of items can be very large, it makes sense to make a single request to the server when hitting the ‘Save’ button, instead of initiating a bunch of individual requests, overwhelming the server and dealing with the responses one-by-one.
- Offline sync: your mobile users’ devices might lose connectivity at times (when switching from Wi-Fi to a cellular network or when taking the subway) while users expect continuous UX when working on something. A bulk API can be useful by gathering all the things the user did while offline and sending them to the server when the network is back up.
- Performance: an API request has the overhead of the HTTP round-trip. In an API intensive application, this overhead might have performance impact on the client, the network and the server. Instead of sending 100 requests with 100 HTTP request/response cycles, with a bulk API we’re doing it once.
- Transactional behavior: a bulk API allows your client to bundle a group of actions into a single transaction that can succeed or fail together as a whole.
- Rate limiters: it is very common for SaaS API providers to apply some kind of rate limiting strategy (usually using some Web Application Firewall or an API Gateway) to protect their infrastructure and prevent it from being stressed by non-legitimate traffic that might undermine its stability. This is generally a good practice when it comes to API architecture but might also victimize legit users and prevent them from using your application. The utilization of Bulk APIs, in this case a set of APIs that you might choose to only expose to specific consumers like B2B’s or other integration partners, will decrease the number of requests and will most likely pass-through the rate limiters.
- API robustness: if your API is heavily used in integrations and in customer automation processes, enhancing your API layer with bulk will benefit both your infrastructure and your customers, by reducing the number of calls and integration complexity and by simplifying it and making it less error prone.
How to Design Bulk API’s ?
OK, so you’re convinced (or thinking about it…) that you need a bulk API, great!
So let’s go and write an API endpoint that receives 100 objects in a single request, instead of one, and process them one at a time. Well, sometimes it’s that straight-forward. But, while working on one of CyberArk’s major features that required bulk APIs, I dug into this a bit more and conducted some research — and found that there’s more to it.
There aren’t well-defined common standards for designing bulk APIs, nor popular frameworks that can help with implementing them. Not everyone is doing bulk (at least not exposing it on their public API) — monster SaaS providers like Slack, Twitter, Pinterest, Spotify or even AWS don’t provide this functionality. But some do, and we’ll take a look at their examples.
Let’s look at 4 design approaches for bulk APIs. The examples you’ll see are mostly relevant to APIs designed according to the RESTful architectural style, and refer to resources and operations. Even if your API does not tightly conform to these principles, you can still get value and inspiration for your implementation.
We’ll be assuming that you already have an API in place for serving individual requests, and the business logic to process them — for the purpose of the examples, we’ll be using a simple customer/level model, where each customer has a name and level. Let’s go!
#1 — The No-Brainer (resource specific, operation specific)
With this approach, we’re designing an API for a specific HTTP operation (e.g. POST) that receives a collection of resources (objects) from a specific type (using a specific resource endpoint) in the request body (e.g. a collection of customers), each can be seen as a ‘sub-request’.
This is what a request might look like:
The API endpoint /api/customer_bulk is specific to bulk operations, you should avoid using the same endpoint that handles single requests (e.g. /api/customers) for your bulk API — this way you’ll have more control over your API. Also, using the same endpoint doesn’t conform with RESTful API standards.
In this example, we’re creating 3 new customers at once, while the implementation of this API controller will typically be using the existing single customer API logic. The response will follow the same pattern.
This is how Zendesk (one of the biggest customer service SaaS platforms out there) and SalesForce are doing bulk.
Being a straight-forward and simple design, this one is quicker to implement — but less flexible, and will probably require additional development for each new bulk API (although there is some room for code re-use).
#2 — The Multi-Tasker (resource specific, operation dynamic)
In this design, we’re building an API that can be applied to any type of operation on a specific resource endpoint. The idea is to receive a bulk of sub-requests, where each contains the operation and some data about the resource and invoke the appropriate API controller you already have in place in the backend (serving individual requests) according to the operation type.
This is what a request will look like:
This design will cover any bulk operation on the resource, but validations might be more complex. You might be required to resolve conflicts of dependent resources or resources with the same IDs in the same request — depends if this is relevant for you, and also depends on your server-side implementation and if it guarantees some order, more on that in Part 2.
Authorization should also be considered, as the user making the request might not have the appropriate permissions for all types of operations.
This approach is a good balance between complexity and flexibility.
#3 — The Killer (resource dynamic, operation dynamic)
With this approach, we’re implementing an API that can be applied to any type of operation on any resource. As in the previous approach, the idea is to use the same API controller logic being used for single requests, according to the resource and operation.
Since this design is fully dynamic (covers any bulk operation on any resource), you probably noticed the request is sent to a generic /api/bulk endpoint, which is aimed at handling any bulk request. The resource endpoint relative URL for each sub-request is explicitly stated in the body.
In this example, we’re creating a new ‘Bronze’ level, using the api/levels resource endpoint, creating a new customer named Josef in that level, replacing customerId=5, and updating customerId=2 to the Bronze level.
This is how Google Cloud implemented their Compute bulk API (you’ll find the same pattern in the Google Drive API). This makes sense, since the usage of Google’s Compute engine service involves many automations and provisioning of multiple environments at once.
Google are calling this ‘batch’ and have defined a very flexible scheme for using it, having the request body contain sub-requests — each being a complete HTTP request (the response follows the same pattern, each with its own HTTP status code). This approach also allows the sending of different authorization tokens to different requests and using different content types.
Facebook and Microsoft 365 are taking similar approaches to bulk.
This method is extremely powerful if your API consumer needs to perform composite operations that are related to different business domains, or perform complex migrations using your API. You’ll still have to do more validations, handle conflicts between resources and verify permissions on different operations.
Having said that, this is by far the most flexible design, yet probably more complex to implement.
#4 — The Killer’s Little Brother (resource dynamic, operation specific)
This one is a simplified variant of the fully dynamic #3 approach that limits the bulk request to a specific operation. Doing so mitigates the need to resolve conflicts between dependent sub-requests.
These examples show a bulk POST request and a bulk DELETE request:
Take away
We’ve seen 4 design approaches for Bulk APIs, let’s compare their pros and cons:
This graph outlines the 4 different approaches considering flexibility vs. implementation complexity:
So before you run-off to option #1, challenge yourself and carefully consider your application’s future needs and the flexibility you would want to provide in its API. By choosing the simplest solution, you might find yourself implementing it over-and-over again for each new feature or API requiring bulk functionality. Eventually reaching the accumulative cost you would have invested if you had chosen a more complex but flexible solution in the first place.
Summing Up So Far
Bulk APIs have many benefits to both your infrastructure and your API consumers. We have seen that there are a few ways to design bulk APIs, each with its trade-offs. You should choose the one that fits your product’s requirements and application’s constraints.
In the next post — How (and Why!) to Build Killer Bulk APIs — Part 2, we will discuss some of the challenges in developing bulk APIs and will show how to address them by sharing architecture considerations & implementation best practices.