From Documents to Resources: Struggling to Write a RESTful API with Hypermedia

Web developers strive to create useful APIs, but there’s more going on than they may know.

In this essay I try to grapple with the question of whether APIs — specifically REST over HTTP — should be specified exactly in advance, or explored by the client dynamically. I take the scenic route through some big ideas about computer networks themselves. If you want just the facts, I recommend this guide.

The first person to conceive of a network of linked documents was Vannevar Bush. In 1945, he knew that WWII was winding down and sought to direct scientific inquiry in a productive way. He envisioned the Memex, a desk-sized machine that kept “trails” of related documents. These trails could be expanded and shared in a collaborative knowledge network. To Bush, the act of linking documents together was as important as authoring them in the first place.

There are are few big differences between the Memex and the modern web. First, our links connect not a whole document but a segment of text to another document. Links are one-way; if you are linked to, you have no obligation to link back. Finally, Bush’s documents were already human-readable; only the metadata for accessing them were encoded for the machine. Once retrieved, the photograph or microfiche would not need to be “rendered” like HTML, only magnified for the human eye.

The closest thing we have to the Memex today is Wikipedia. When navigating and consuming information, it’s incredibly useful to have links that go directly between documents. (Instead of just a table of contents or an index like a paper book.) So when we develop APIs, we want to have a similar capability to find related information.

Martin Fowler has an essay on the levels of REST (which he attributes to Leonard Richardson). Hypermedia — standards-body language for links — is the third and final level. That is, it’s possible to have a partially RESTful API without links, so long as endpoints are divided into resources (level one) and addressed with HTTP verbs (level two).

Those verbs exist because we’re not always reading from an API the way we read Wikipedia. Instead, we’re often doing as much writing as we are reading. Once we’re POSTing new documents, it’s very unclear what data the server expects and in what format, even if the API has provided us the link URL. Even a DELETE request, though simple to construct, will (hopefully!) require the client to authenticate — but how? Forbidden access is only one of the many ways these requests can fail, which REST tries to capture with HTTP status codes. (In my opinion, response status codes are equally important to REST level two as request verbs.)

There is another and more fundamental difference between Wikipedia and an API, and that is that the documents aren’t consumed by a human being directly, if at all. When Wikipedia serves a page it has a good idea of what the HTML is going to look like. When an API serves JSON (or XML), it has no idea whether the document is being displayed in plain text to a developer or being fed into an application — whether desktop, web, or mobile. Most of the time though, navigating through JSON documents is done indirectly though a user interface. Incoming data becomes text and tables; links or buttons send requests back to the server.

Because JSON can be nested recursively within its own arrays and objects, every JSON document structures its data differently. And precisely because it’s general enough to drive whatever UI you want, it can be hard to determine the meaning of different fields without documentation. This is doubly true for a program acting autonomously instead of a human being. All of this assumes the client and server can even agree on JSON itself — to say nothing of HTTP. Which brings us to Bret Victor:

So you’ve got these two programs — [which] don’t know anything about each other — written in totally different times, and now they need to be able to communicate. So how are they going to do that? Well, there’s only one real answer to that that scales, that’s actually going to work, which is they have to figure out how to talk to each other. Right? They need to negotiate with each other. They have to probe each other. They have to dynamically figure out a common language so they can exchange information…
What won’t work, what would be a total disaster, is [an] API — this notion that you have a human programmer that writes against a fixed interface that’s exposed by some remote program.

I think most of his stuff is mind-blowing, but I’m going to say that Bret is dead wrong here. There wasn’t some could-have-gone-either-way choice made at ARPA to rely on protocols and standards rather than interpretation and guesswork. Computers are really bad at dealing with ambiguous communication, but they’re great at doing the same thing repeatedly. Hence we have TCP and HTTP and other acronyms that end in P, standing for protocol. Smart engineers figure out how every machine is going to communicate and then they all adhere to that (malicious actors aside). Imagine the alternative: every time someone wants to load another web page, you’d have to fire up a machine learning algorithm worthy of an autonomous vehicle.

But is this still true at the application layer, where protocols are called APIs for whatever reason? It seems that our APIs resemble protocols and contracts more than interpretive dance. But, we do want them to be flexible and expressive, in addition to being reliable and predictable.

(A brief intrusion about non-RESTful APIs: GraphQL and Protobuf both require the shape of the data to be defined in advance and known by all parties. Falcor seems to provide a JSON object whose structure is enforced only by convention. So, apparently, we can nail down every detail or nothing at all.)

A good RESTful API walks this tightrope using links. If our client can know where to make an initial request, the server can send back hypermedia that aids the client in making the next request. This idea is sometimes called HATEOAS: Hypermedia As The Engine Of Application State.

In order for this to work, client and server must agree on things like HTTP, JSON, and even where in the JSON to find the links. There’s a standard called JSON API that helps with that last part. It’s like a protocol for JSON, defining the structure of request and response bodies. It also specifies HTTP verbs and statuses for particular situations. It keeps the data’s attributes separate from links to related resources, and even lets servers supply a link for a relationship itself. Sending DELETE to such a URL would indicate that Joe is no longer the author of this article, even though both Joe and the article continue to exist.

Such cases aside, I think JSON API’s hypermedia support really excels for GET requests. There’s places for pagination links, links to related resources (e.g. all of Joe’s articles), and other miscellaneous links. (For example, a link to GET just this article after finding it in the index.) It has a field at the top level for resources related to the one you’re primarily retrieving, addressing the common complaint that an API oriented around resources requires multiple requests to get all the required data.

But, I think JSON API’s support for write operations is lacking. There is no standard place for including the HTTP verb(s) a link accepts, so it’s pure guesswork whether DELETE-ing a relationship will be accepted by the server, or what would be returned if you were to GET it. It’s also impossible to know what a POST request should look like, especially if you’re creating the first resource of its kind. Finally, there’s not over-arching index page of the API, so it’s hard to know where to start or what operations are available.

The other technology is this space is the Open API Specification, but I’m going to use the old name, Swagger. Where JSON API is a way to structure what’s sent over the wire, Swagger is a schema for the API that all clients are expected to obtain prior to making any requests (although it’s possible for the server to host its own Swagger file). This schema indicates what routes are available, what parameters they expect (and some simple validations like numeric ranges or regexes), and what responses (status codes and bodies) could be returned. It is a complete and precise machine-readable description of your API.

Unlike JSON API, Swagger is very unopinionated about the structure of your API. It makes no effort to assure that it is consistent (minus the ability to reuse definitions). Its goal is to be able to express any web API, not just a RESTful one, and the upcoming version 3 is even more generic. In particular, its definitions are separated by path, not by resource, so /users and /users/{id} are considered separate.

Swagger seems to defeat the purpose of hypermedia, and even JSON API. If we know what URLs are available and for what operations, then just request that (plucking an id out of a response if necessary) rather than following links sent by the server at runtime.

It seems incredibly difficult to write a program that will consume an API of unknown structure and have it do something useful (which often means a user interface). Swagger is a valuable tool for the programmer to reference, perhaps even to generate code. But I would not want to write a program that consumes a Swagger schema and then has to build a website around that API. That’s too high-level of a goal to give to a computer.

Although JSON API is useful for structuring an API, it falls short of HATEOAS. There do not exist standards for APIs to talk about what operations (other than GET) that they support, or what parameters they require. Nor is there a standard way of describing all resources available in a single entry point. The closest thing we have is the HTTP OPTIONS method.

But even if it were possible, I’m not sure it would be useful. Let’s stop pretending that sending hypermedia will make APIs as easy to consume as Wikipedia. You aren’t there to interactively guide the program as to what to do with each of the links it gets back; neither will you leave the program to figure it out yourself. You need to make assumptions about the shape of the responses. And if you’re writing data to the server, you also need to make assumptions about the shape of the data to send and what validations the server will perform.

I want hyperlink APIs to work, I really do. They have the potential to make our programs more resilient and our programming less imperative. But from my current vantage point, it appears that the best way to describe an API is with a single, standard reference like Swagger. Justify assumptions by contract, not by dynamic exploration.