This is the second in a series of three posts on REST APIs.
• Part 1: Introduction and planning
• Part 2: Schema suggestions, common mistakes and deprecation
• Part 3: Documentation tips and moving beyond the basics
While your API code might be super-clean and shiny your consumers will in all likelihood never get to see it. What they will get tired of seeing though is your API’s schema.
You should attempt to keep your schema clean using clear self-explanatory names and a structure that is as close to self-documenting as possible. Moreover, with a few simple rules, you should be able to future proof it so that you can keep adding more information to existing responses without having to deprecate your major version early.
1. Schema implementation suggestions
The most common schema-related mistakes that I have encountered over the years come from a lack of strictness in the API development environment. When designing your API, using a proper framework for your programming language goes a long way to help you do things The Right Way.
Take your time to study the available options and bear in mind that while most web frameworks provide tools to help you implement a REST API, some will leave much of the work up to you. Don’t reinvent the wheel.
Make sure that your internal data layer is strictly modelled using an ORM for your database or another data model infrastructure. Your responses should be comprised of nested data business models that are strictly defined in your code. Do not rely on results from database queries mapped to plain dictionaries generated at runtime, or other similar approaches. Keeping a strict model layer is what you should be doing in all your apps anyway.
Your data models should always be filtered before being exposed to the API responses to avoid leaking sensitive information. Any changes to fields getting exposed to the API should be assessed for compatibility with the current API version and added to the API’s changelog.
2. Good schema practices
a) Consistent naming
Your naming should be meaningful and consistent across your entire API, in URIs as well as request and response fields. Ambiguities in naming cause much stress to other developers, as well as yourself when you return to your code after a while.
Remember that your URIs and schema should communicate their purpose to your consumers as clearly as possible without them having to constantly browse your documentation. Don’t be reluctant to use verbose names for difficult concepts as long as you don’t go overboard.
b) Strict data types
Your schema must always be strictly-typed even if your programming language of choice is not. Number fields should always contain numbers, string fields should always contain strings and so on. You should never mix different data types in the same field. Your values may fluctuate but your data types should not.
It’s a very common mistake in APIs created in loosely-typed environments to see a field containing the numeric value 42 in one response and the string value “42” in another. This practice is inconsistent and difficult to parse safely in all clients. In a loosely-typed schema, clients must make some very dangerous decisions regarding their leniency when parsing each individual field.
Obviously, this does not only apply to primitive data types (numbers, strings, booleans etc) but to JSON objects and arrays as well. Don’t return an object of type
Chair in a field that contains an object of type
Table, or an array of
Cars in a field that contains an array of
Even though the above seems like really basic knowledge, you’d be surprised how many good developers get it wrong, falling victims to elementary weaknesses in their development environment.
A strong model layer for your implementation helps you avoid such embarrassing mistakes.
c) Don’t omit fields
When there is no value available for a certain field do not omit that field entirely. Use either
null, an empty string, an empty array, or zero, depending on the data type and the semantics of the missing value.
Don’t make it hard for people to understand your schema by having them make 10 requests to get all the fields when one request should be enough. Perhaps they can look it up in the documentation, but why not make it easier for them and yourself and just make the schema easily understandable and close to self-documenting? Remember, documentation usually gets stale much faster than code does and code is what generates your schema.
Again, this type of mistake is much more easily avoidable if you use a strong model layer in your implementation.
d) Don’t abuse JSON objects
I’ve seen this more times than I can remember both in API requests and responses. This also stems from bad practices commonly associated with loosely-typed languages.
Let’s assume you have a
people object with unique ids as keys and sub-objects as values:
people object’s fields change every time its nested contents change. In this call, it has the fields
5678 but nobody knows what its fields will be named in the next call.
This is a terrible practice and makes for inconsistent and generally terrible code when parsing in any strictly-typed language. Each JSON object in your API should always have an immutable, strictly-defined set of fields between requests.
The above is an excellent use-case for arrays. Just return an array and include the id inside each array element.
Here’s what the people object should look like:
Usually this type of schema abuse occurs when you try to make lookups easier in your backend code by using a dictionary with unique-id keys. You have to keep in mind that this is a case where an internal implementation detail is leaking to your users — a phenomenon that you should avoid in all aspects of software development.
You’re probably getting tired of reading this, but such issues can also be avoided with a strong model infrastructure.
e) Don’t abuse JSON arrays
So, you’ve followed the previous advice and changed some of your objects to arrays. Good job!
Now you have to make sure that your arrays contain only one type of object. Don’t mix apples and oranges! Remember, not all clients use loosely-typed containers for their data, and parsing lists of heterogeneous resources is not only inconsistent and annoying, but also unsafe!
When you absolutely cannot avoid returning different kinds of entities in the same array, try to return a list of super-objects that are abstract enough to describe the attributes of all the types of objects you need to return.
In the apples and oranges example, maybe you should instead be returning objects of type
Fruit can contain all the attributes for both
Orange objects, as well as a
type field that specifies exactly which type of fruit each object refers to.
If the attributes of the items you return are completely disparate for each returned type, but you still have to return them in the same list, you might have to resort to “extreme” measures such as container objects. This is not a very elegant solution but there are some edge cases which make it necessary.
Here’s an example of container objects:
Your endpoint returns a list of flying vehicles that a person owns, as well as some basic characteristics for each one of them. Flying vehicles can either be airplanes or hot-air balloons and those are quite different from one another. Semantically, it makes little sense to add attributes such as wingspan, number of engines or horsepower to a balloon, the same way it makes little sense to add attributes such as basket, balloon material and balloon shape to an airplane.
Adding all those attribute fields to a single object type would be redundant. Instead you can keep your nice and clean airplane and balloon objects and store them inside simple container objects.
In this case, a container object of type
Vehicle (essentially a supertype) would have a
type field (with a value of either “airplane” or “balloon”), an
airplane field, and a
balloon field. When returning an
Airplane object, you set the container object to the “airplane” type and then you populate its
airplane field with the
Airplane object and the
balloon field with a
null value (remember to always return all fields — even when they’re empty).
Again, this is a design that you should avoid if possible, but if you really have to return completely different objects in the same collection, container objects are a good way to maintain a strictly-typed schema in a (sort of) clean manner.
f) Don’t rely on plain hard-coded error messages
I hate to disappoint you, but no matter how fun and witty your error messages are, they will rarely reach your end-users’ eyes. It’s not that other developers don’t appreciate your writing skills, it’s just that you never know how a client will have to present an error. It might be a popup with many lines of text, a short toast message, a red border around a text-field or even a sound that will convey the meaning of the error in an entirely different way.
Furthermore, you must always return your errors in a succinct, machine-readable way to make them easy to parse and distinguish from one another by your client apps. You should return proper HTTP status codes and include additional information specific to the error (such as internal codes and additional information) in a strictly-defined error object in the response body.
Here’s an example:
Your consumer requests a certain order for a certain customer via the following endpoint.
If the customer or the order is not found, the response should have a 404 Not found status code, but is that enough? The client won’t be able to distinguish exactly what caused the error, because it does not know what exactly was not found. Was it the customer, or the order?
That’s where the
error object in your response body comes in handy.
code field inside your error object makes things more specific for clients. Moreover, making it a string (e.g. “customer_not_found”) instead of a number, makes things much easier for developers — who don’t have to browse through a table of numbers and error descriptions in your API’s documentation.
message field could better explain the reason for the error to developers, so they have a better idea how to handle it and where to look for additional information. Ideally, your errors should be localized based on the Accept-Language header of your clients’ requests. Who knows, maybe that way end-users will get to read your masterpieces at some point.
g) Don’t use numeric enums
As mentioned again and again, you should strive for an easily-readable and self-documenting schema. Don’t use numbers for enumeration cases. Use simple strings.
You have a
type field in your
Animal object? Don’t use
5 as its values.
“elephant” are much easier to read by humans and make almost no difference to a machine that knows how to compare strings.
People usually do this when they have numeric enums being used internally in the backend, but that is (again) an implementation detail that should not leak out to API consumers.
I’ve also heard excuses such as increased bandwidth consumption of the strings approach, but there are other, better ways to deal with that issue. Your schema should be verbose enough to be easy to understand with one look and you should be reducing your bandwidth consumption with Gzip, which makes a world of difference compared to saving a few bytes by using numeric enums.
h) Don’t return non-encapsulated JSON arrays
What exactly is encapsulation (or JSON envelopes) in this context? Simply explained, it means to wrap (or envelop) your response data in a JSON object and return that in a
data (or other, similarly-named) field in the root of your response body.
Some people seem to think that this is a good practice for all responses, because it allows you to add metadata fields in the future, (such as error or pagination information), without tampering with the main response object. Although that might require a bit more code when parsing, it does make the API schema a bit cleaner.
Even if you don’t want to do this for all responses, I believe it’s extremely useful (even necessary) when you are returning a collection of objects. In such cases you should never leave an array as the root container of your response!
The main justification for the above, is that if your root container is a JSON array, your schema changes radically when the response needs to return an error which will inevitably be a JSON object. That makes parsing more complex without providing any actual benefits.
Furthermore, (even if you disregard the above), an array makes early deprecation of your API even more probable because it can never be changed or amended in any way without deprecating your schema. On the other hand, using an object as the root response container allows you to add as many fields as you want later on, without causing deprecation. Heck, you could even return a different, updated type of objects in a new array, as long as you make sure to keep the old array around.
i) Use Unix timestamps or ISO-8601 dates
My personal preference has always been to represent dates as Unix timestamps in responses, due to them being relatively short and very easy to parse. However, unless you are a machine, they are not easy to convert and actually be read as real dates. ISO-8601 dates on the other hand are better from a readability aspect, but are a bit more difficult to parse (although not much).
Any other string format besides those two should be avoided at all costs, as it may create ambiguities at parse-time. I know you can specify your own datetime format in your documentation and clients can parse dates based on that, but remember: your API should be as easy to understand as possible without much external help.
j) Avoid huge flat objects
A is not a user; and yet contains fields such as
user_pet and so on, maybe it’s time to just use an encapsulated
User object inside object
Since (database as well as API) schemas have a tendency to become more complex over time, it’s good to try to normalize them and keep them as clean as possible from the beginning.
If you feel that nesting the entire related object is overkill, you can just filter out fields that are not relevant to the current response. If the nested object is important to your consumers in some other context, it will most probably be provided by a dedicated resource of your API. In that case your consumers can retrieve it in its entirety if you make sure to provide a hypermedia link (when using HATEOAS)or its primary key.
k) Use JSON booleans for (duh…) boolean values
This does not make a huge difference, but why use
0 when JSON already has a boolean type? It’s 2019 and we now have actual
false keywords for that…
Besides, using 1 and 0 implies that there’s also a 3 to be expected, or even a minus one!
If you did that on purpose, thinking you might need more values in the future, maybe you should be using a string enum instead of numbers and ditch the true/false approach anyway.
l) Use objects for fields that could need more information in the future
Always do your best to future-proof your API in any way that you can. Try to anticipate attributes that will most likely require additional information in the future, so that you can increase the lifespan of your major version.
Here’s an example:
Let’s say that you have an
is_available Boolean for each one of your
Book objects. While that is enough to let us know that a book is unavailable when it’s
false, it does not tell us why it’s unavailable or when it will become available again. In the future, if you want to add that information, you will have to pollute the root of the
Book object by adding the two extra fields.
A cleaner approach would have been to use an
availability field, that stores an
Availability object, that at first contains only the
is_available field, but can be amended to contain additional info about the book’s availability (such as the reason the book is unavailable and a timestamp of when it will become available again) without deprecating your schema (more of that in the next section).
3. Schema deprecation
If you’re using a versioning approach that allows for small incremental changes to your API while guaranteeing structural stability, you should be very careful every time you change things.
Certain changes to your schema mean instant deprecation of your current major version. You should avoid them unless absolutely necessary. Although they might be obvious, I find them useful as a checklist of things to keep constant when updating your schema.
Please keep in mind that the changes listed below are not the only causes for schema deprecation (which can often be caused by details in your specific design) — just the most common ones.
a) Don’t remove fields or change field names
You can never be sure how your consumers use your information. No matter how insignificant or redundant a field might seem, if you made the mistake of shipping it to production, then you’re stuck with it until the next major version. Changing a field’s name is obviously the same as removing it.
b) Don’t change data types
Data types must remain strict not only across responses, but across minor versions as well. This is another bad practice related to loosely-typed development environments.
You cannot change the data type of a field without deprecating the current version. If you made the mistake of passing a number as a string in your response when you first shipped, then you should always be returning it as a string until the next major version.
c) Don’t edit existing enumeration cases
Returning numbers (remember — you shouldn’t) or strings as enumeration cases is quite common.
For example, you might be using the cases:
“motrcycle” for the field
If you noticed the typo in
“motrcycle”, I understand how frustrated you must be feeling, but you cannot fix it! Your consumers are depending on the mistyped version and will be sad to see it go. You can do it in the next major version though (and don’t forget to add it to the changelog).
d) Don’t change error codes for existing endpoints
Nope. Your client apps depend on the old ones. They may, or may not know what to do with the new ones, but in any case, you can’t be certain.
This was the second in a series of three posts on REST APIs.
You can read the last one here:
Part 3: Documentation tips and moving beyond the basics