Optimizing your server by limiting request overheads

Success is a double-edged sword — increased request volume and more edge case requests stress your server. Many a server has failed because of lack of scalability in handling that higher volume of requests, and from edge case requests that thrash memory or CPU. There are 3 legs to managing this success: make each server more efficient; scale your servers horizontally; and limit the impact of each request. In this article I’ll cover the main approaches for limiting the impact from requests.

Reducing Request Overheads — in one slide …

Request Overheads

The business processing required to satisfy a request is the essential work that needs to be performed. Everything else is overhead. Typically the largest overhead comes from marshaling and unmarshaling data, and sending over the network — and the volume of data returned is fundamental to these. So limiting the data returned is the primary tuning lever for minimizing overheads. The following techniques are the most common approaches to tune this.

Technique 1: API Design

Carefully consider which data items need to be included in the request response. If it seems like some clients will need more than others, create different APIs or ones with different parameters, and make sure that the cost to the client is clearly differentiated for making the request that returns the data with more volume (higher latency, higher memory, etc).

In the absence of any explicit cost, some clients will request larger data volumes as a lazy future proofing technique — because changes in the client are easily handled by accessing data that is already local rather than having to change the request or add a new one. You need to think defensively of the server resources — it’s better for the client to receive less data from a working server than no data from a failed server.

Technique 2: Data Shrinking

Minimize the size of the data transferred. Consider compression, minifying, and binary format (eg using HTTP2), if it would be reasonable to use any combination of these. A shared dictionary on both sides of the transfer lets you send codes rather than constants (eg A stands for “fieldname1”, B for “fieldname2”, etc).

But you have to balance the readability of the raw message for debugging purposes against the size improvement, so don’t just apply these without thinking it through. I’ve seen projects that have created a tool to convert the data from minimized format to readable format for debugging, but this adds developer friction so has development cost.

Technique 3: Pagination and/or Lazy Population

The most common technique for returning smaller amounts of a large dataset is to paginate the result. You deliver to the client a chunk (a page) of data at a time, and when they look at a data item which would be in the next chunk, the client gets that chunk from the server at that point, rather than having it already present. This technique is generalized into lazily populating data (fields, properties, etc) as they get requested.

You need to strike a balance between the number of additional requests this generates and the amount of data transferred. If you transfer every little field separately in a separate request, this will be inefficient both for the server (many more requests to handle) and the client (high latencies when looking at lots of data), so usually you chunk up multiple small fields together. You want to transfer a “page” size of data each time, where that page might be a combination of several fields or just an actual page of data.

Technique 4: Streaming

The most effortful implementation is to return your data as a stream rather than a chunk. This requires fundamental changes on both the server and the client, but can be very resource efficient. It’s especially optimal where the data can be consumed as a stream (eg video, audio, where you don’t retain the data, you use it as it comes in, then discard it), but can be appropriate in many situations.


Do consider these techniques every time you create or modify your APIs (internal as well as external, including those used by your client wrappers) and your system will be much more maintainable as you scale it up.