How (and Why!) to Build Killer Bulk API’s — Part 2

Published in

CyberArk Engineering

8 min readJul 20, 2020

In this post we will follow-up on How (and Why!) to Build Killer Bulk APIs — Part 1 — in which we explored some design approaches to bulk APIs, and got inspired by some real-world industry examples of how big API providers implement it.

With all the advantages they bring, building bulk APIs have some challenges. In this post, we will address these challenges and discuss architecture considerations and implementation best practices for developing bulk APIs, including:

How to serve a bulk API request
Validations
Bulk responses
Handling conflicts
Performance issues
Building an asynchronous API architecture

If you’re into enhancing your application with bulk APIs — by the end of this post, you’ll be all set to start building them.

First Things First — Serving the Bulk API Request

We will make the assumption that you already have an API and some business logic in-place for serving individual requests. Depending on your existing architecture, there are several architectures you might consider:

Single controller: in this architecture, the bulk API controller is directly utilizing existing business logic on the server. This is more suitable for tightly coupled monoliths, with little to no modularity.
Sub-controllers: here, the bulk API controller is initiating requests to existing single-request API controllers and building a bulk response from the individual responses. This is also relevant for monoliths, but for ones that are more broken into separate services internally. Depending on the API framework you’re using, you should consider loopback (routing the request back to the same host) and the security issues with this approach.
API routing: in this architecture, the bulk API controller breaks the request into individual requests, which are routed to corresponding API endpoints over the network — relevant for decoupled micro-services architectures, using a token for authorizing sessions. In this case, the bulk API controller is following an API Gateway pattern and acts as a proxy that just forwards requests and gathers the responses to a composite bulk response.

Validating Requests

When validating the bulk request, you will typically be applying the same validations you have in-place for individual requests. Having said that, you should pay attention to the following:

Don’t fail the entire bulk request if a few sub-requests failed to be validated. Better to continue processing all sub-requests and return a partial success response. You should apply basic validations that are relevant to the entire request on a front-end layer (like input validations, authorizations, bulk size if limited) and ‘postpone’ the rest of the applicative validations to a more downstream business logic layer. This is true especially if you choose to implement an asynchronous API (we’ll discuss it a bit later).
Avoid repeating unnecessary validations. For example: if you’re validating a value against a dictionary or a different service API, don’t validate it 100 times if it appears 100 times in the request. Instead, group such values to a distinct list and validate them once — or at least cache the dictionary or the ones that have already been validated.

Handling Responses

When it comes to the bulk API response, the controller handling it will typically gather the individual sub-requests’ responses and return an aggregated response to the client. Here are a few best practices you should consider when handling the response:

Matching requests and responses: it is common to return the responses in the order in which their corresponding sub-requests have been received.
Google uses an optional Content-ID HTTP header that can be used in each sub-request, and the matching sub-response will be returned with the same header and value for client correlation. You can apply the same technique by adding a unique identifier to each request.
Return links instead of resources: When creating new resources using non-bulk APIs (POST), it is common to return the newly created object in the response. If your bulk API could potentially return very large responses, you might consider returning an absolute GET URL pointing to the newly created resource (instead of the resource object itself). This technique is sometimes referred to as HATEOAS (Hypermedia As The Engine Of Application State) in REST API architecture standards.
Error strategy: As for errors, in case one or more sub-requests have failed to execute (partial success), it would be confusing for the user to return an HTTP status code that indicates an error. If no unexpected error occurred (5XX), or the request failed to be validated (4XX), the response should be 200 OK. If there are some sub-requests that failed, an applicative error should return on the individual sub-response level.

Here’s an example of a request and a response:

As you can see, requests and responses correspond using a matching requestId. The first 2 requests have succeeded and the third one failed — yet the overall HTTP status code is 200 OK. The newly created resource in the second response is represented as a link to a GET URL pointing to it.

Conflicts and Ordering

As discussed in the previous post, what if the bulk request contains sub-requests for resources that are related to each other, or are being applied to the same ID? You might be required to resolve these conflicts, to prevent arbitrary processing in terms of order and unexpected behavior. There’s a trade-off here between implementing the server-side logic and doing parallel processing of sub-requests using multiple threads, and the ability to guarantee order and avoid conflicts.

Facebook addressed this issue by pipelining an output of a sub-request to the input of another sub-request, allowing the API user to explicitly specify a dependency between operations:

In the example above, the second sub-request depends on the list of friends ID’s returned from the first sub-request.

Microsoft solved this by using a dependsOn property that implicitly states the order in which sub-requests will be executed:

In the example above, sub-requests 1 and 3 will be processed first, then sub-request 2 and finally sub-request 4.

In case you choose to return HTTP status codes for each sub-request’ response — if a sub-request has failed, any sub-request that depends on that sub-request should return an HTTP 424 Failed Dependency status code.

What About Performance?

Since with bulk requests we’re doing more processing on the server (compared to a single request), serving them can take some time (and might eventually timeout…). Remember that the overall latency of a request depends on the size of the bulk and the operation that takes the longest (‘weakest link’).

To help mitigate this, you should consider applying some configurable limit to the amount of sub-requests you’re allowing to send in a single bulk request, and immediately deny a request that exceeds it with a 403 Payload Too Large response.

You can also limit the concurrent number of bulk API requests, either by using rate limiters or managing this state by yourself (according to your backend’s ability to handle them in parallel) — and deny requests with a 429 Too Many Requests in case that limit is reached.

Asynchronous APIs

If it fits your requirements or UX, you can reduce the stress from your server and build a robust and scalable bulk API by making it fully asynchronous. Here are some of the benefits of this architecture approach:

It eliminates the need to block the client’s thread while processing the request.
It reduces the memory footprint on your server, not having to maintain heavy request threads in the case of many concurrent API calls.

Building an asynchronous bulk API is achieved by exposing 2 endpoints, having each bulk request comprised of 2 steps:

A bulk request is sent to your bulk APIs endpoint (e.g. /api/bulk). Once you have validated the request input, return a 202 Accepted response immediately along with a bulk request identifier, and process the request on the server.
The API consumer can use this identifier to poll for a response on a second API endpoint (e.g. /api/bulk_result). This API endpoint will return a 102 Processing temporary response while the request is being processed, and return a full response (200 OK) once it has finished.

This asynchronous request high-level flow is described in the following diagram:

When choosing to implement an asynchronous API — if you’re running a distributed system that supports high availability with multi-instances of your application — you probably want to manage the bulk request processing in a database (each bulk request will be typically represented by a table row uniquely identified by the bulk request identifier from step 1). Having this state persistent will provide you with better control over the process, including:

Aggregating the response and serving it for step 2 (client polling for the response)
Recovering from an instance failure
Enforcing a limit on the maximum concurrent bulk requests
Troubleshooting

Make sure you’re not applying too many writes while processing the request, too much I/O can have an impact on performance. Choose the important ‘check-points’ to update in the database (e.g. every N sub-requests).

Using a queuing system (e.g. RabbitMQ) to manage the asynchronous request flow is also an option. Having a request queue for incoming requests and a response queue from which the client will receive the response.

Wrapping Up

There are many ways to design and implement bulk APIs. It all comes down to the requirements, specific use-cases and your current architecture constraints when choosing how to do so.

Bulk APIs have many benefits to both your infrastructure and your API consumers. Make sure you consider some, or all, of the factors described in this post — and adopt the best practices most relevant for your application and those that will bring the most value to your users.

Good Luck!