On API Evolution: Globally Consistent Interfaces

Maintaining the interface contract is a two-sided affair

Before you dive into this article, it’s worth reading Phil Sturgeon’s enthusiastic piece on API Evolution for REST/HTTP APIs if you haven’t already.


One of the points Phil touches on is related to handling input when your data model changes.

When folks POST or PATCH to your API, if they send a name you can convert it, or if they send first_name and last_name it'll get picked up fine on the serializer. Job done.

This is deserialization (parsing input). There are a few standard processes in place over HTTP that handle deserialization for us, but if you want your API to evolve, it’s something that you will seriously consider taking under your control.

I‘ve only seen a few examples of folks trying to do this in the wild. Phil himself has in the past brushed aside requests for this sort of functionality to be included in Fractal, while he was still involved with the project.

He told me:

“I regret previously saying a serializer should not get involved with deserialization. I think a shared contract for both is hugely beneficial”

In talking to Phil (on the APIs You Won’t Hate Slack), I learned about Roar, a Ruby gem that helps with complex deserialization scenarios.

It doesn’t seem like there’s much discussion around this in the PHP community. Am I just unaware of it?

We’re missing a serious deserializer package in PHP.

Please, correct me if I’m wrong.


If your back-end data model is changing to prefer name over first_name and last_name, you’re going to be changing a bunch of things along the way: validation, documentation, tests.

Without the transformation layer that a deserializer can provide between input data and your data model, changes to your back-end will probably need to be rolled up into a major release of your API. You’ll almost certainly end up supporting 2 (or more) versions for a while.

A sensible versioning protocol seemingly makes this easy. You can imagine a developer justifying this plan with:

“Well, it’s v3 so, you know, everyone should expect stuff to break!”

That’s a fair point and sometimes it’s completely unavoidable to cause backwards incompatible changes. But what if you could avoid this by evolving your API?

What if you could maintain a consistent, consumer-facing input interface regardless of how your back-end changes?

If you could do that:

  • You’d probably be able to reduce the internal effort required to support any data structure changes.
  • You may even be able to make the data structure changes without the trouble of versioning your API… Evolution!
  • Or deploying a separate installation/introduce new feature flags.
  • Or nudging/pushing/forcing your consumers to upgrade.
  • Or fielding a bevy of support tickets from all those consumers whose apps have ‘suddenly and unexpectedly’ broken, despite your best efforts to forewarn them.

In theory, it could save a lot of precious time.

It definitely feels possible to come up with a generic approach that could work in a lot of cases.


Granted, this is quite idealistic so I’m going to caveat my points with the fact that I don’t know what every situation is like. In more complex scenarios a generic solution may not work. But that doesn’t mean you shouldn’t implement something.

If you’re already doing something like this, I’d love to hear about it!

If you’re not yet doing it, I heartily recommend that you consider it.

It can mean fewer changes happening at the same time and less room for error. I’m certain it will save a lot of developer time and cognitive energy. Trust me, that’s good for everyone.

One Approach

Our approach is quite straightforward and not extracted into its own service layer yet: our base model class contains some methods for parsing and validating request input data that all of our models inherit.

Individual models then simply define an array that maps input attributes (received from consumers as JSON) to the underlying model attributes for storage.

Something like this for example:

class Model extends BaseModel
{
protected $deserialize = [
'id' => 'user_id',
'name' => 'full_name',
'email' => 'email_address',
];
}

The external interface is represented by the array keys (on the left) and the database columns are the values (on the right).

This feels nice and simple to me. It’s also pretty clear. For a new developer, it’s easy to see what’s going on, even if they had to guess.

It’s tightly coupled to the models in our case, but there’s no reason why this couldn’t be extracted for portability.

Consistency FTW!

Explicit deserialization has a number of benefits, but the most important one in my view is consistency.

Consider: serialized resources in endpoint responses give an insight into the data model. To consumers, perhaps subconsciously, it suggests a contract that says ‘this is what our underlying data model looks like.’

Without a similar deserialization of input parameters, this ‘contract’ may appear broken:

“Wait… I get id on output, but I have to use user_id on input!?? Weird.”

That’s more than just untidy. It’s dissonant, confusing and reduces predictability.

It’s not impossible to get around this using good documentation, of course.

Developers can survive even a wildly unpredictable API if it has approachable, precise, accessible and up-to-date documentation. PHP is itself, unfortunately, a prime example of this!

And documenting is easier than ever with JSON Schema. It even enables you to automate client app validation and keep your tests up to date.

But if your interfaces are the same, developers won’t need to inspect the docs for trivial differences between input and output.

And imagine: with a suitable approach, your deserialization rules could feed automatically into your JSON Schema!

Why not make the effort to make everyone’s lives that little bit more pleasant?

Results

In our API, our data model is now almost completely decoupled from our external interfaces, giving us a great deal of flexibility.

  • If we have to make what would sometimes be a breaking change to our data model, we can shield our consumers from it by changing a single item in an array.
  • Our validation rules, input parameters and output attributes are all aligned so that there is one consistent and predictable structure for each resource.
  • We have a flexible and simple approach for automating a big part of our documentation, validation and tests.

This may not catch all of the possible breaking changes that might ever come up for us, such as renaming or removing a whole endpoint, for example.

But those kinds of changes are generally fewer and further between and require enough consideration that they can be done carefully with plenty of warning to consumers.

Who knows how much time this will save? Way more than it took to implement, that’s for sure.

Conclusion

Aiming for API Evolution is idealistic, but attainable. With a little effort up front it’s possible to smooth more of the edges of our interfaces to make it less challenging for everyone when we have to make changes under the hood.

Deserialization is another bow in the quiver. So do it already!

Hopefully you can see that if PHP had a good deserializer package, we could enable Evolution for even more APIs.

If you’re keen to build a deserializer package for PHP, get in touch and let’s see if we can build something together!


Thanks to Phil for continuing to share his insights and for reviewing this post.

He’s a vocal proponent of good API design who has talked and written a lot about it over the past few years, a respected authority on the subject and someone I’ve learned a lot from.

You should definitely be following him if you’re interested in building pleasant APIs.