How should your API handle data dependencies?

Ted Spence
CodeX
Published in
6 min readOct 3, 2022

When two fields in a data structure are linked, usability can be tricky

So your team has designed an API that takes, as input, an address. You begin by internationalizing the address into country-neutral field names such as country, region, postalCode, city, and address.

Pop quiz: Which of these fields are required?

For country codes, you can mark the field as required and download an ISO 3166 library to validate API calls. But there’s more: In the US, the region field is also required, as it contains the two digit state code. Elsewhere in the world the region field is less important and can be left blank.

This is a field-level dependency: The status of the region field depends on the value supplied in the country field. Let’s talk about how your API can express this dependency to your developers.

Are your fields fully separate, or do they depend on each other? (Geograph)

Document your field-level dependencies

Most API definition languages — such as my favorite, OpenAPI — support flags that designate a field as required. You can easily mark the country field with required: true.

This doesn’t help us for the region field. The required flag is just a boolean true or false value, and we can’t express a dependency on the country value. We do however have an available description that we can use to explain the situation.

Let’s mark the country field as required, and add documentation and validation to explain to users that region is only required when country = 'US'. Here’s what such an OpenAPI specification might look like:

But what happens when a developer doesn’t pay attention, and they accidentally send country = US but region = NULL? Do they get a basic 400 error that says “Invalid input?”

It helps here to provide a robust and well-written field validation error that matches the documentation we wrote on the field:

This may feel repetitive. The error message we wrote for this condition is virtually the same as the description we wrote earlier. It may feel tempting to just give a generic error and tell developers to “go read the documentation”, but that approach isn’t going to win any friends.

When we write the same documentation twice, we increase the chances that developers will find the information they need. Not every developer wants to scour through documentation beforehand. Some developers like to experiment, to hack things together, or to copy already working code. Those developers will benefit from the clarity that a well-written error message provides.

Here are a few of the other issues that we should address:

  • Your error structure should support multiple validation errors in a single response. It’s frustrating for developers to think there is only one error, then to retry an API call after fixing that one error, only to see another one appear.
  • An API error should contain a user-readable message that could be shown to an end user. If you want your developer customers to be able to build web applications on top of this API, you should provide error messages that don’t require developers to translate them in order to be suitable for display to end users.
  • I am a strong proponent of help URLs, especially ones that lead to a wiki page or other documentation site. This would allow you to refine your documentation and provide support to developers who are confused by an error without having to ship a new API release.

The basic approach of “required means mandatory in all circumstances” is easy to understand and to explain. Since country is always required and region is not, these values are correct and easy to explain, and as long as our developers read the description they won’t have any trouble.

Are there other options?

Use union types to represent complex interdependencies

One approach we could take is to design our API to allow multiple possible input objects. In object-oriented languages, this is known as polymorphism; in TypeScript it can also be known as union types; and in OpenAPI documentation it is known as “anyOf”. Here’s how it works.

  • You define a single API endpoint.
  • The input for the API can be one of a few different possibilities.
  • Each possibility defines different fields that are required.

In this world, you could have two structures: a United-States specific address and an internationalized address. Here’s what it might look like:

Hmm. There’s a problem here. Even though you have two different address objects, it’s not clear that setting Country to the value of US commits you to using the UnitedStatesAddress object!

In TypeScript, with union types, you can demonstrate this dependency in a clear fashion:

But will all your developer customers use TypeScript as well? Although many languages support union types, this feature doesn’t exist everywhere. A C# or Java developer can’t simply import your union types and see them represented natively with the same fidelity that you expect.

The union types approach faces a few challenges:

  • Uneven support for OOP. Some union types are congruent with polymorphism and object-oriented programming; others are not.
  • Loosely typed languages. You’ll have to create accommodations for developer customers working in non-strictly-typed languages such as Python or Ruby.
  • Dependencies across APIs rather than within structures. You may have slightly different rules for your data structures when you call Create() or when you call Update(). This can generate a lot of complexity if you enforce strict object typing for each.

Overall I find union types to be a great toolkit for improving the ease-of-use of your SDKs in languages that support union types; but I still recommend that your API should implement validation-and-documentation so you have baseline usability for developers unable to take advantage of them.

Complex dependency and validation rules

As you continue working with your API, you may begin to discover even more complex validation rules. Some of these rules might be able to be expressed as unions, but they grow increasingly challenging, especially when they cross different problem domains.

  • A field in your API might only be required when a specific configuration flag has been set on your account.
  • A transaction might need to have a foreign key to a valid office where the shipment will be delivered.
  • A shipping label for one vendor (say UPS or FedEx) might require a tracking number, whereas a different vendor (say the postal office) might not support one.

In theory you could keep developing union types or class hierarchies to reflect these details — but is that really the complexity you want in your API? How many different schemas would you need to represent them?

Pick the approach that provides the biggest wins

The biggest challenge with data dependencies is that there are simply too many of them, with too many different types of issues. If we make our schema definition language sufficiently complex to represent all possible dependencies, we still can’t guarantee that all developers would follow it.

On the other hand, “Document-And-Validate” approach is:

  • Usable across programming languages. Virtually all programming languages have an IDE that can display hover-doc comments on fields. JSON error messages are easy to read and parse.
  • Documentation upfront and after the fact. If your developer customers like to read documentation before writing code, great! But if they just like to try API calls to see what errors they get, they’ll also find your well written documentation.
  • It is simple. Don’t underestimate the value of simplicity! Clever tricks impose a burden on you, who has to make it reliable, and on your developer customers, who have to grok a complex solution.

I continue to encourage API designers to pick the simplest approach that results in a usable product. Clever data schemas may help, but you can’t go wrong with good documentaiton and a good error message.

Ted Spence teaches at Bellevue College and lives in West Seattle. If you’re interested in software engineering and business analysis, I’d love to hear from you on Twitter or LinkedIn.

--

--

Ted Spence
CodeX
Writer for

Software development management, focusing on analytics and effective programming techniques.