The Definitive Guide for building REST APIs

Or: how to not write APIs that stink

Your API, running wild

The “I” from “API” comes from “Interface”, and the great issue of our century in the programming world seems to be the lack of good interfaces. We have good hardware, good programming languages, good network speed to communicate inter-machine processes (inter-continents, too) and good protocols at our hands. But the interfaces design seems to advance like a slug: up two meters, sleep, down one meter, awake, up two meters, sleep, down one meter…

While protocols play the role of the language (and a kind of medium, specially in the “physical” levels of network protocols, for instance), defining how a component should “speak” to another (sorry, saint Dijkstra, I anthropomorphize openly), the interfaces define what a component should say and what to expect as answer.

Now, in real life there's a lot of examples on which we should choose a adequate “interface” for our own communication. A bug reporting process, for example, could be so bad as:

Man, the files is messed up…

Or so good as:

Trying to download a file I find this issue: instead of receiving a 302 status_code with adequade “Location” header, I'm getting a 401 erros. Look:
$ http GET “http://localhost:8000/v1/files/42?_download=1" “Authorization: token $TOKEN”
HTTP/1.1 401 Unauthorized
Connection: keep-alive
Content-Type: text/html; charset=utf-8
Date: Fri, 05 May 2017 21:57:37 GMT
Server: gunicorn/19.6.0
Transfer-Encoding: chunked
Vary: Cookie
Via: 1.1 vegur
X-Frame-Options: SAMEORIGIN
The authorization token is valid, I just checked.
If I try to access the endpoint without _download=1 or with _download=0 everything works as it should (200 status_code).

Notice that the protocol in both cases is right: the English language. The medium can be right, too: a Issue on Github or another bug tracking tool. But the interface is horrible on the first example. It's the trigger for the “bad bug report dance”, whose second step is “what kind of problem are you experiencing?”. That's why the development teams choose more than a protocol (English language?) or the medium (Bug card on Jira?), but also the right way of reporting a bug: that is a kind of interface. If you define a bug report should follow the form of the second example, a report like the first would be rejected, since it does not comply with a series of chosen criteria.

With that in mind, let's talk about interfaces between computer programs, the APIs.

Why APIs stink

1- Because they are “almost” REST. Emphasis on “almost”.

Almost!

To use HTTP and answer with JSON (and/or XML) does not make your API a REST API. The heyday of bizarreness in communication of the Computing history, SOAP, uses (/can use) HTTP and serializes data with XML, but… right?, that's not a reason to call SOAP “REST”.

The issue, here, is falling on middle-ground: you could be using a simple and universally recognizable standard or use any other standard, but not: your own “homemade standard” is an “almost”: it seems like REST, but it's not REST. And, not even having a name, you simply don't mention it on your API' s documentation.

Or, what's even worse: you do say your API is REST, but, in truth, it's not.

1.1- Ignorance about HTTP

Quite often these almost-REST APIs are the results of a legitimate but frustrated attempt of being one. And the reason for that, as I understand, resides on not trusting on the HTTP protocol. What is, in general, purely by ignorance about it.

My sugestion: read the RCF:

(If you're already familiar with the RFCs, notice that RFC 7230 makes 2616 obsolete. And, yes, there are some important differences between them. ;-)

1.2- Ignorance about HTTP methods

Methods are colloquially called “verbs” and not without a reason. They are described, in general, by verbs, like GET, POST or DELETE. There are exceptions, of course, like HEAD and OPTIONS, and that's why “verb” is only a nickname, not a rule.

See the RFC 7231:

GET

Basically, to extract data from a specific document or list the documents from a resource.

Let's say we're implementing a pipes vending system and we have a resource named “Pipe”. The corresponding endpoint is identified by the URL https://<domain>/v1/pipes/ . Thus, I can list the Pipes on the system with:

GET /v1/pipes/

Or I can extract information about a specific Pipe with:

GET /v1/pipes/:pipe_id

1.2.1- GET returning 404 on listings

Wait… What?

This discussion is a bit complicated, since HTTP protocol doesn't say much about the topic. But what it does say is enough:

The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource or is not willing to disclose that one exists.

Let's say we don't have any registered Pipe and we send this request to the system:

GET /v1/pipes/

Which code the server should return? Obvious: 200 (OK). After all, (1) the server has found a representation for the target resource? Yes! It's a list with zero elements! Or (2) the server don't want to disclose if such representation exists? There's no reason to (unless the client has no permission or something like that).

Now, if I try to access a inexistent resource, then it makes sense that what would, supposedly, be a listing, to return 404 (Not Found):

GET /v1/cars/

There's no “Cars” in the system! Thus, the 404 code makes sense: such endpoint doesn't even exist!

POST

To create a new document. As in:

POST /v1/pipes/
{"name": "100mm", "color": "white", "objective": "Pass shit through it. Literally."}

1.2.2- Resources and actions being confused

Action scene, very resourceful

A common characteristic of APIs that stink is the confusion they make between resources and their endpoints and the responsibility for taking actions. And that reminds some of the typical problems of SOAP over HTTP: the server responds with status 200 (Ok) saying “an error has occurred”.

What?

The same happens to you when, on your Pipe store, creates a resource like this to create Cards that go inside Pipes:

POST /pipes/:id/create_card.json

C'mon! POST already means “create”. So why do we need the create_ part on the endpoint URL? Besides that, what kind of resource is this? Resources are all about data, because the actions are already covered by HTTP. Right?

And more: why the hell the Pipes endpoint is going to create a Card?

Remember: resources are all about data. If you put a verb (like “create”) in a URL, something wrong is going on.

The right way of creating a Card would be:

POST /cards
{"pipe": <pipe-id>, "name": "Example name", ...}

PUT

Use it when you want to alter an already existent document. Curiously, the server is supposed to replace the document with the one you're sending, so you must send a complete document.

If you refer to a inexistent identifier, the server is supposed to create a new document with the data and identifier you provided.

1.2.3- PUT AS CREATE

Now, I know there's some programmers sects that prefer to use the PUT verb to create new documents. And, look, you're not forbidden to implement your verbs the way you want into your server. If you want to create a new cult, like “To DELETE is to Create”, it's just a matter of a few lines of code and you'll have a server that uses the verb DELETE to create new objects.

But your API is going to stink.

The HTTP protocol is crystal clear:

  • When using POST, the URL must represent a resource, not a document.
  • When using PUT, the URL must represent a document, not a resource.

Thus:

POST /v1/pipes/1 = wrong
POST /v1/pipes/ = right
PUT /v1/pipes/ = wrong
PUT /v1/pipes/1 = right!

So you could even use PUT to create new documents, as long as it makes sense, in your proposed architecture, that the client be the one defining the documents identifiers! If you're using sequential numeric IDs, for example, using PUT can become a too complicated task.

Personally, I believe there are too few cases when using PUT to create documents is a good idea.

PATCH

PATH is PUT's younger brother and answers the obvious question “why the hell should I send the entire document if all I want is to update one or two fields?”.

Using the PATCH verb you can modify a limited number of a document's fields, what can be very convenient.

1.2.4- PATCH implemented the wrong way

I haven't the slightest idea where this myth was born, but there's a bunch of developers out there that believe the PATCH verb should be used to alter “one and only one” field per request. That makes no sense, even because the common sense of “fields” (developers tend to think in terms of JSON or XML) isn't even cited on this method definition. What the definition does say is that the body of a PATCH request must tell the server how the document should be altered. No implementation details are given at all.

Remember: HTTP is a protocol. It doesn't descends into the implementation level, since you can use HTTP to implement pratically anything. A PATCH request could alter a binary file or even change the tone of a musical note in a song. That would make the concept of “fields” ridiculous.

2- Because they don't understand that simple is more robust

Robustness is the child of transparency and simplicity
(Eric Raymond, The Art of Unix Programming)

2.1 — Trying to look beautiful, they welcome parasites

Let's say we have a printer shop. So, we have both Printer and Cartridge resources. Let's say the second one is completely dependent on the first: one cartridge is compatible with only one Printer model. It may look almost obvious to some developers that the Cartridges should be accessed through this URL:

/v1/printers/:printer_id/cartridges

But this is awful. Look: the next logic step is to try to access an individual Cartridge. So, you'll end up with this:

/v1/printers/:printer_id/cartridges/:cartridge_id

Everything seems to be fine, at least until you get a Cartridge ID through another source that does not include Printer information. Thus you see that :printer_id is a parasitic information: it is there on the URL only to hinder you.

So, avoid “nesting” resources URLs. Each resource should have its own specific URL, so we can access documents simply knowing their identifiers, without the need of any other information.

With the URL above, if you have only the ID of a Cartridge, you won't get direct access to it, being forced to make a scan through the Printers.

A better schema would be:

/v1/printers/:printer_id
/v1/cartridges/:cartridge_id

2.2- They try to solve everything on the URL and even think it's beautiful

Following the above schema, how would we do to list only Cartridges related to a certain Printer?

Very simple: using a filter whose parameters would be passed as query parameters on GET:

GET /v1/cartridges/?printer_id=:printer_id

But the URL of the cited pipe shop implements something that exemplifies the issue about trying to make everything happen on the URL:

POST /pipes/:id/create_card.json

That .json in the end looks very practical, but the same functionality could be implemented with a HTTP header on the request, like Content-Type (or, in the case of a GET, Accept ).

“But, but…”, you may be babbling, seeing how much more practical it is to use a .json or .xml in the end of the URL.

Okay. But think about this word for a second:

Consistency

What about when we want the list of documents in a specified format?

GET /cards/.json (???)

Is this an appropriate alternative?

Ah, I know! Let's twist the API a little more:

GET /cards/list_cards.json

What about that? Even worse, hum? Yeap. That's the reason we use headers to define the “media type” of the requests — at least when the API doesn't make it clear that only one media type is used.

See:

GET /cards
Accept: application/json

3- Because they don't understand that the line it is drawn, the curse it is cast, the slow one now will later be fast as the present now will later be past

The pipe store (the real one) is an example of what not to do: their URLs are not “versioned”: there's no reference to the version of the API being useed.

Now think about the fear on the part of the developers of hundreds of companies that are already integrating theirs systems when they see that not only the API has no version information but the product is sold as “beta”, too.

Always version your API. Remember: after a development team of another company passes a whole year developing a product that integrates with your API, they'll get furious (with reason) if, suddenly, the endpoints stop working or the responses come different than before. Once you publish your API, it won't be “unpublished” anymore.

A simple versioning example:

/v1/pipes/

Or:

/v<version>/<the-rest-of-the-path>/

And don't exaggerate: things change, but to change a whole API takes some time. Simply use integer numbers.

4- Because the developer does not use, himself, other APIs

Weird the programmer already was, actually.

It may seem incredible, but I dare to say that that's the case on most of the development teams around the world, even on big corporations. APIs stink simply because the developers do not have the experience or even “intimacy” with third party APIs. Some mistakes that are obvious to me weren't obvious to the developers at the time. In many cases the developers never have done any “serious” integration jobs before developing their own API!

And that's a hard to tackle problem, since hiring less experienced programmers, in general, is the way most companies find to fit in the budget. It's a wrong solution, but we must remember that it is hard to find good developers so, in many cases, that is the only option such companies have.

If you are a developer without much experience using third party APIs, it's already a good step to read this article. Look for good reading material about this subject, get away from Microsoft ideas about it and don't waste your time in useless discussions like “you must use a / in the end of the URLs” versus “you should not use a / in the end of the URLs”.

If you are looking for developers to build an API for your application, try to find someone experienced on that. Even if the costs are higher and the process may be longer, it's worth the investment.

4.1- Pagination based on page number

From lack of experience, many developers choose to create paginated listings (which is good), but base their pagination system on the blogs they read (you already noticed the /page/2 in the URLs) instead of basing it on the DBMSs (with limit and offset).

If your API does paginations based on page number, you're already beginning with a difficult choice: “what should be the default page size”? It's difficult because it's really hard to find a nice base on which define such a number and it's a nonsense because the right thing to do is to let the client choose the page size.

Beyond that, whenever a client wants to implement its own pagination system (for the frontend, for instance), you force it to juggle around with methods of adapt its own page size (15, for example) with the server's page size (100, for example).

So, when you finally decide to let the client choose the page size, you'll realize everything could be simpler if you just used limit and offset since the beginning.

4.2- “Errors” that are not

That a kind of fall into the “not knowing the HTTP protocol” issue, but is also a sign of inexperience: if the developer never worked hard into integrating with third party APIs, he won't give enough attention to the right HTTP return codes.

For example: let's say that in that pipe shop system there's a restriction: the Pipe's “name” and “color” must be unique together. That is: there'll be no two or more Pipes whose “name” and “color” are the same. In this situation, what code the server should return if the client tries to create a new Pipe with the same “name” and “color” of a previously saved one?

400 and something? No, because the client hasn't done anything wrong! The request was made to the right URL and with valid payload and headers. So it can't be 4xx.

Besides, 400 errors in general means “correct you request before trying again”. But what should the client correct?

500 and something? No, because avoiding to create a duplicate into the system is a feature, not a bug. So, no, you cannot say there was an error on server side.

Besides, 500 errors in general means “wait a moment and try again”. But, no matter how many times the client try again, the results are going to be the exact same.

The solution is very simple: to return a 200 code (Ok) instead of 201 (Created) with the header Content-Location pointint to the previously existing entry. See https://tools.ietf.org/html/rfc7231#section-3.1.4.2 . It adds clarity to the communication.

The question, here, is not how simple the solution is, but the fact that an inexperienced programmer would never stop to think about this “issue”. The database accusing a conflict, he would simply pass the problem along with a wrong status code.

And his API would stink.

How to create an API that does not stink

Here's the summary of what was said:

  • Version your URLs;
  • Be consistent;
  • Avoid nesting resources;
  • Do not allow verbs like “create”, “update” or “delete” being part of the PATH of your URLs: that's what HTTP is for;
  • Use POST to create and PUT andPATCH to update;
  • Filter with query parameters;
  • Select the media type (like JSON or XML) in headers, not in the PATH;
  • Use the right HTTP return status codes;
  • Do not return 404 for listing with zero items;
  • Allow PATCH to change N ≥1 fields;
  • Make paginations with limit and offset ;
  • Beware with 4xx and 5xx status codes: do not point fingers where there is no one to blame.
Make your API be beautiful

Maybe this “guide” is not exactly definitive, since there's always a new way of making an API stink. But, you know, at least I tried…