API versioning and evolution with proxies

The technological choices made at the beginning of a project or a company can very likely end up proving to be either inadequate or outright wrong a few years down the road.

If there’s one thing I’ve internalized more than anything else in my years working at startups, it’s that a complete rewrite of a service or functionality (called a “golden rewrite” by one of my ex-colleagues) must always be the very last resort, only to be undertaken when all other options have been exhausted. Evolution of the codebase and infrastructure can prove to be challenging and not to mention time-consuming when not undertaken in an iterative fashion. An iterative refactoring often entails getting comfortable with the notion of living with a certain degree of imperfection, as well as knowing when and how to cut corners without sacrificing the overall quality or reliability of the product.

This can be particularly challenging when it comes to public facing APIs that require maintaining backwards compatibility at all costs. As often as not, the way teams manage to support multiple versions of an API is by having some form of versioning scheme (most commonly it’s baked into the URL or in the HTTP Headers though in some cases it’s also dynamically assigned) and maintaining multiple codepaths, endpoints and/or shims for different implementations of the underlying business logic. This approach leads to a bit of a bloated codebase not least due to the number of regression tests required to guarantee that prior versions of the API aren’t broken by new features.

A tale of two API servers

At a previous company I worked at, our API server had gone through a bit of an evolution over the course of ~6 years. Initially, we had a public facing API (A) that primarily acted as an auth proxy which in turn used a custom protocol to talk to an internal API (B) which did the bulk of the heavylifting.

It’s important to note here that these API servers in questions were not the core product of the company but were what I call Tier 2 services. Tier 2 services were those that didn’t impact our core offering, and while they had to be up and running, they didn’t have to be particularly performant and could enjoy a relaxed SLA. In this case, servers A and B together formed the APIs of the customer facing account management dashboard.

We also had nginx terminating SSL at the edge, but has been omitted from all the illustrations. Both API servers A and B were written in Python 2.7 but were not leveraging any of the asynchronous Python framworks like Twisted or gevent, which meant that both servers only processed one request at a time. This, per se, wasn’t a problem for a long time, since to reiterate, these were only ever Tier 2 services of the company.

However, with time came the need for certain additional functionality, the implementation of which was not deemed viable in the blocking server. We decided to write a brand new non-blocking server (let’s call this B2) to implement this functionality. Most of the functionality that was required to be implemented in this manner in server B2 was purely internal and not customer facing, which led us to decide not to expose this server to the internet and to only access it via internal clients (I1 and I2) that had a fairly straightforward way of authenticating and authorizing with server B2.

At this point, our service interaction looked like the following:

This worked well enough for a while, but with time it became obvious that a lot of the functionality being implemented in server B2 had significant overlap with the functionality in server B.

Furthermore and unsurprisingly, server B2 proved to be so much more performant than its synchronous counterpart B that it soon became clear to us that all new functionality was probably best implemented in server B2. Server A was still deemed necessary since it continued to serve as an auth proxy for external requests. Thus, at this point we had all user requests first hit server A which in turn would then talk to either B or B2 to service the request.

A few months down the road it was decided to consolidate all user facing endpoints into the non-blocking server B2 and completely deprecate server B. It was also decided that we wanted to expose a purely RESTful interface in B2 primarily because an engineer on the team had a strong preference for consistent RESTful APIs whereas at the time all of the servers had a decidedly RPC-esque interface. While the jury is still very much out on whether this was the right decision or not, at the time it sounded very reasonable indeed.

Thus work went underway to:

— port over all the user facing RPC based endpoints from server B into RESTful endpoints in server B2 (we decided to call this /v3)
— convert all internal facing RPC-esque endpoints in B2 to /v3 style RESTful endpoints
— do away with server A entirely (which over time had become more than just an auth proxy, what with some portions of business logic bleeding into it) by having the /v3 endpoints in server B2 be directly responsible for authz and authn
— and have the dashboard directly talk to the (now public and RESTful) /v3 endpoints server B2

During the time it took us to port one endpoint after another from the old setup (involving servers A and B) to server B2, the architecture looked like the following, with some API calls completely bypassing server A while others were fronted by server A and serviced by either servers B or B2.

Since the dashboard was for the most part the only client of servers A and B, we enjoyed the luxury that the new version of the API /v3 could afford to be completely backwards incompatible. Even so, deprecating services is a lot more non-trivial than even the most conservative of estimates might suggest, especially so when the services being deprecated lack sufficient test coverage and documentation. When confronted with the reality of Hyrum’s Law in practice, surprises are a aplenty as are discoveries about hitherto unsuspected and exotic use cases of the API.

For one thing, it turned out there was an endpoint in server B that was, in fact, a part of our public documentation, which meant a large number user agents (and not just the dashboard) were accessing this endpoint. Over the years, this endpoint (let’s call this endpoint /v2/foo) had become an especially high-traffic endpoint. It wasn’t unusual for this endpoint to be hit with hundreds of concurrent connections per process at the same time, the processing of which took long enough for the kernel’s accept queue to get saturated and for the load balancer to start timing out all requests to service A leading to a partial outage of the dashboard (unless more processes were spun up to absorb the spike). This endpoint was arguably the most pressing endpoint to be moved over to the non-blocking server B2 which was well poised to handle a large number of concurrent requests.

In order to deprecate service A without requiring our customers to make any change to their existing code, we needed to come up with a way to mimic service A’s public interface. This could’ve been implemented in service B2, but that would’ve violated one of the design principles, viz., service B2 had to expose a clean, RESTful API. The interface service A exposed to its clients was somewhat SOAPy and the idea of sullying service B2 by introducing a /v3 endpoint that didn’t conform to REST didn’t sit too well with anyone on the team.

In addition to the problems with /v2/foo, it also turned out there were select few customers who had been given access to certain other endpoints as well in service B which needed to be supported going forward.

Seeing API Versioning as a Routing Problem

There are several ways we could’ve solved this problem, but we took the position that maintaining backwards compatible APIs could be fundamentally seen as a routing problem. From the users’ perspective, what was of the essence to them was that they continue sending us requests and receive responses in the format they were used to. How we internally handled these API calls was an implementation detail our users needn’t need to know or care about.

We went down the route of placing a pure proxy in front of server B2 which transparently proxied requests to server B2 except those select requests which required the behavior and the interface of service A to be mirrored. The proxy was called Sentinel and was written in LuaJIT which offered a runtime that was orders of magnitude more performant than Python’s.

Where the primary design goal of server A was to act as an auth proxy, the primary design objective of Sentinel was two-pronged:

— to be as pure a proxy as possible (with little to no business logic)
— to shield the /v3 endpoints in B2 from having to understand the RPC protocol used by its predecessors

Sentinel could understand the RPC protocol used by service A and massage the incoming requests into a RESTful format, which it then dispatched to service B2 for the actual processing. Sentinel could also interpret the response sent back by B2 and dress it up in the format the clients were used to handling, paving the way for us to decommission service A for good.

After service A was finally decommissioned

Additionally, Sentinel also allowed for further refactoring of the API when called for, such as with the one problem endpoint /v2/foo in the erstwhile server B that saw an incommensurate amount of traffic compared to all other endpoints. While this was originally built into service B2 as a /v3 endpoint (called /v3/foo with Sentinel doing the /v2/foo RPC < — > /v3/foo REST translation), it became clear that even a non-blocking Python server behind a LuaJIT proxy was inadequate given the traffic and latency requirements of this endpoint unless we were fine with spinning up additional instances of service B2 from time to time (which we did, for a bit, but eventually throwing more resources at a problem caused by the shortcomings of a programming language proves to be untenable for a variety of reasons).

The way we tackled this problem was by standing up a new server (let’s call this server P) that only handled requests to the endpoint /v2/foo. Sentinel could route all requests to endpoint /v2/foo to server P, which was written from the ground up to absorb and handle spikes in requests without requiring the users to change the way they used the API.

Dynamically driving maintenance page updates

A proxy can help with more than simply routing requests to different backends. For one thing, all of the CORS responsibilities were off-loaded to Sentinel. For another, one of the more interesting features of Sentinel was how it automatically drove our maintenance page updates.

While ideally zero-downtime infrastructure changes is preferrable (be it database schema changes, failing over a replica and so forth), it’s not always feasible. Services being down for scheduled maintenance is commonplace. In our case, we could afford to take the API down for maintenance since it wasn’t the core product of the company.

When the API was under maintenance, the dashboard displayed a message with information about the status of maintenance progress, including when the service was expected to be back. Before we had Sentinel, this was done by redeploying a special version of the dashboard at the start of the maintenance window and deploying the normal dashboard back again when the maintenance was complete. While this wasn’t a task that involved significant toil, it also required that our frontend engineers be on hand for any backend maintenance.

With Sentinel, we were able to automate the whole process away by designing an API for such dynamic updates. The dashboard was modified to continually poll an endpoint /v3/health which returned a JSON response along the lines of:

"maint_mode": true,

along with the response header Retry-After, which indicated how long the client was expected to wait before following up with another request. When the API was under maintenance, a call to the /v3/health endpoint would return an HTTP 503 status code along with the Retry-After Response header being set to a value that would indicate to the dashboard how long it had to wait before retrying the request. When not in maintenance mode, a call to this endpoint would return a 200 and the dashboard was free to access all of the API.

Retry-After isn’t something I’ve seen commonly used for this particular purpose, even if its most common use cases are:

The Retry-After response HTTP header indicates how long the user agent should wait before making a follow-up request. There are three main cases this header is used:
— When sent with a 503 (Service Unavailable) response, this indicates how long the service is expected to be unavailable
— When sent with a 429 (Too Many Requests) response, this indicates how long to wait before making a new request
— When sent with a redirect response, such as 301 (Moved Permanently), this indicates the minimum time that the user agent is asked to wait before issuing the redirected request

What is of interest here is that the /v3/health endpoint was implemented in Sentinel and not in the API itself.

We used Consul for service discovery and for storing small amounts of configuration data in the Consul K/V store. Consul watches make it possible to monitor specific keys for updates and invoke a handler every time there’s a change. Another option is to make use of blocking queries to long poll a Consul endpoint for updates. Being able to set a watch on a key is supported by most Chubby inspired systems, be it Zookeeper or etcd or Consul. The key in Consul was set to something along the lines of:

"maint_mode": false,
"retry_after": nil

Upon bootstrap, Sentinel read the value of the key from Consul and stored it in an in-memory data structure, in addition to setting a watch on the key. All responses to the /v3/health endpoint resulted in this in-memory data structure (a Lua table, in this specific case) being consulted before the response was formulated. Any update to the key in Consul could only be triggered by an infrastructure engineer on the team, and a watch would trigger almost immediately after the key was updated, resulting in Sentinel picking up the change and updating its in-memory configuration. All subsequent requests to /v3/health would reflect the update.

Putting the dashboard in maintenace mode became quite as straightforward as an infrastructure engineer running a script to set a key in Consul. Since only external API calls were routed through Sentinel, we could selectively take the API down for maintenance for external users but still have internal clients be able to access the API, which was vastly more preferable to the previous way which entailed stopping all instances of the API rendering it unavailable to all clients (internal as well as external).

Having this kind of “kill switch” in a centralized location can have additional benefits. Fred Hébert, in his review of this post, went on to suggest that:

the healthcheck endpoint can have cool trick, where for example, you can have one per service (or API version), and then a more global one. If you can choose whether the global one uses an OR or an AND comparison with sub-healthpoints, you can create a kind of healthcheck API that can carry on the meaning of partial failures upstream. You can use it as a circuit breaker for parts of your infrastructure. If this mechanism can be detected from the API nodes (with still having the master kill switch in Consul), and hooking that service discovery to the proxy itself, you can kind of imagine it as letting proxies automatically detect network faults take themselves out of rotation, and join back in, and so on (error page handling is harder to reconciliate there)

The overarching idea here really is about the importance of being able to feed dynamic configuration updates to APIs for signaling partial failure. It’s my belief that feeding such updates to a separate process (be it a proxy or some sort of daemon running on a host) is easier to reason about and operate than building this logic into the core service whose configuration is being tweaked.

Being able to drive dynamic configuration updates either by pushing the update to a process (and triggering a reload with a SIGHUP) or by having the process long poll for changes (in the manner described above) has myriad benefits and practical applications, including driving deployments. Increment had an interesting article on how Slack approached deployments, and it was via dynamically pushing deploys to all hosts.


This post described the introduction of a hand-rolled proxy into an existing infrastructure to make API migrations and deprecations a lot easier. The same could’ve been achieved with off-the-shelf proxies like HAProxy or nginx as well, with a tool like Consul-Template redrawing the configuration of HAProxy. Rewriting URLs in nginx (albeit fraught with peril) is a practice commonly used industry-wide.

The goal here isn’t to encourage people to roll their own proxy or adopt a service mesh architecture. As often as not, infrastructure, APIs and services (as everything else) exist in a state of mess and chaos as business requirements and usage patterns evolve.

Refactoring of existing systems involves making tradeoffs and compromises. Proxies are a powerful tool in the evolution of infrastructure. Used well, they can help with the modernizing of “legacy” services in an iterative way without resorting to the sort of “golden rewrites” that seem like a good idea at the beginning but always (in my experience) end up being anything but.