User-Driven APIs

Published in

YipitData Engineering

8 min readJun 13, 2019

Motivation

At YipitData, we’ve developed tools and libraries with APIs, and until recently, our users have primarily been other YipitData engineers. As we’ve developed Readypipe, our user base has expanded outside of our engineering team. Readypipe offers APIs, both code-based (Python) and web-based, for easier large-scale web scraping. The expansion of users has given us the opportunity to think about how to design our APIs to help all of our users successfully solve their problems.

Very brief intro to APIs

As a newer programmer, I was intimidated and confused by the term “API” — I’d see it used everywhere to describe seemingly unrelated things. It seemed way too general to be useful!

API stands for Application Programming Interface, and, unfortunately, that’s the most accurate way to describe it. I’ve found that a more useful description is “how your code interacts with other code.”

I’ve also found examples to be helpful in understanding what that actually means. There are two common types of APIs I’ve encountered: web APIs and library (or code, tool, module, whatever you want to call it) APIs. This post focuses on library APIs, but examples of web APIs seem to be more common, and there may be other types I’m not familiar with.

An excellent example of a web API is the Stripe API, which allows you, as a user of Stripe, to interact with things like your balance, payments, and customers. They give an example of how to get your balance here.

A notable example of a library API is Flask, a simple, flexible web framework for Python. We use Flask (and Django) a lot at YipitData, and admire its simplicity and excellent documentation.

You might use an API manually — for example, in your terminal to make a web request and look at the output — but more likely, you’ll use an API to integrate some third-party functionality into your own code.

If you write code at all, you’re using an API: each language has its own API that provides the foundation for your own code.

Ultimately, if you are designing or maintaining an API that is being used by anyone (including you), it’s a product, and the users of that product will be more successful if you design it with them in mind.

When designing an API, ask these questions first

Who are your users?

It might just be you, or a few of your coworkers, or it could be a large open-source project with lots of users. Readypipe’s users are internal engineers and analysts, as well as engineers, analysts, and others at other companies doing web scraping. The variety of new Readypipe users forced us to rethink how we develop new APIs, and how to adapt some existing APIs to new users.

Does your API solve one or more of the users’ problems?

If your API doesn’t solve your users’ problems, they have no reason to use it. Understanding their problems, and making sure you solve them, is the core of designing any product, including APIs.

Do they understand how to use your API to solve their problems?

Designing a user-driven API is both about solving their problems, and about making sure they understand how to use the API to solve their problems. It’s fundamentally about having empathy for the user. If they struggle to use your API because, for example, the naming is inconsistent, then they won’t be able to successfully solve their problems!

Then, try a few of these approaches

Start with the Readme

Our CTO shared this post with our team, which introduced me to Readme-driven development. I’ve found it to be a simple and powerful approach for designing APIs without writing any code.

The idea is to write the Readme, the basic documentation of how to use your API, before writing any code. This fits well with test-driven development (TDD) — writing tests before functional code — in that both approaches separate the implementation details from the more important aspects of your code, i.e., how to use it (Readme) and whether it works (TDD).

We’ve been fortunate to have the time to take this approach with a few of our tools that had existing APIs. For example, I work on an internal library, Pipeapp, which is a queueing framework that is part of Readypipe. It’s designed to make it simple to run frequent, small tasks on a series of queues. It had an existing API that served our engineers well for a few years.

Then, when we started building Readypipe, we wanted to expose that API to more types of users. We decided to take the opportunity to reconsider the API, and approached it as if we were designing the Readme for the first time. Ultimately we came up with clearer names for functions, an interface with less boilerplate code, and better, higher-level error messages.

You might not have time to evaluate and change the existing API, but I’ve found this approach impactful, even only with new code (like with TDD). If you’re really lucky, you can get feedback from your users on your Readme without writing any code! For us, this has saved time and energy that might’ve been spent on implementing a less usable interface.

Ask “why” to understand the users’ core problems

Before building an API, talk to potential users to understand the problems that might be solved by your API. We’ve found that asking “why”, at least a few times, helps us understand the true problems we can solve with the API. As a bonus, if they’re generous enough with their time to talk to you about it, they’ll likely be great test users to validate that your API actually solves their problems!

Before improving an existing API, talk to existing users to understand whether it already solves some problems they have, and whether they know how to use it. Anything they’re confused about how to use is a good candidate for improvement. Again, keep asking “why” until you understand the root cause of their confusion.

A mistake I’ve made in maintaining Pipeapp, our queuing library, is not understanding users’ core problems. A memorable example involved users wanting a better way to load their scraping queues by querying some data they’ve stored in the past. Without talking to the users, I thought the problem was that they had to write too much code to perform this entire operation. With the help of a team member, I added one function to solve that problem, which looked like this when used in a task:

It took the results of the query, and scheduled them to the queue.

When we released it, users asked for variations: can I add extra data to each item on the queue? What if I want to modify the data from the query? What if I need to query from a different database (something that happened as we migrated from one DB to another)?

I was frustrated — I thought we’d added the feature everyone asked for! But I’d failed to recognize their underlying problem: there were multiple functions for querying, multiple ways to modify the results, and multiple ways to schedule the items. They didn’t know which were reliable, performant, and used less memory.

I realized they didn’t want a new, magical function to solve their problem. Instead, they wanted a good pattern to solve separate problems: how to query, how to modify items, and how to schedule the results.

Ultimately we did add a function, query, that solved the query problem, and already had a function, schedule_many, that solved the scheduling problem. Then we suggested a pattern:

This is much more flexible than our original solution — users can modify the data, query from different databases (query can be configured to work with different databases), schedule to multiple queues, etc., by starting with this pattern. It solved their problems, and they understood how to use it.

Deprecate unused features

The other half of understanding the users’ problems is understanding what features are not solving their problems.

For us, this has had a cost: having too many features added confusion; less popular features were less likely to be tested, maintained, or documented as well as they should be; and the lack of support for these less popular features eroded confidence in the core feature set.

Deprecating features has been a difficult process — who wants fewer features? The push to deprecate was supported by the expansion of our user base to outside of our engineering team, since there is now a larger cost associated with too many features. But I think I made a mistake in not originally understanding the ongoing cost of these features for our engineering team.

Knowing what is being used has been a difficult problem for our APIs. For a web API, this could be easier by having some kind of monitoring on each endpoint. For a code API, the same approach is much more difficult.

Internally, we’ve started to search our codebases for usage of particular methods or options to better understand how often features are used, but it’s difficult to do scalably and it’s imprecise. Ultimately, talking to our users has given us the best information about what’s being used and why, and has given us more confidence to deprecate features, even if they’re being used.

Be explicit about what should be stable

For code APIs, this is where there’s an important distinction: public and private APIs.

Briefly, public APIs are meant to be used outside of the API itself, and should remain stable, at least within a major version of your library or web API¹. Private APIs are meant to be used to implement your API, but shouldn’t be relied upon by users of the API, and may not remain stable from version to version.

In some languages, like Python, the distinction between public and private is by convention: non-public variables and methods have a leading underscore, e.g., _do_something, but users can still access those methods outside of the containing class, object, etc.² In others, like Java, the distinction is explicit in the code — you declare a method as private or public using keywords — and enforced by the language, meaning users cannot access the private methods outside of the containing class.

Since we use Python, relying on making methods, functions, classes, etc. private by convention led to some miscommunication when we were maintaining this library for internal engineers:

We weren’t always clear with users about what should be stable: although the Python convention is for public methods to be stable and private methods to be unstable (within a major version), we weren’t always clear with internal engineers that we followed that convention.
We didn’t always decide what should be stable: we changed the behavior of public methods within major versions a few times, which broke internal projects.
We didn’t always understand when our users needed a particular feature: we had cases where users relied on a method meant to be private because a stable public alternative wasn’t offered.

We learned these lessons a few times when internal engineers were our only users. As we expanded our user base, we’ve used documentation to be explicit about which features are public and stable. Documentation has also kept us, as the library developers, honest in maintaining API stability within a major version.

Conclusion

When designing or changing an API, it’s critical to remember that the API is a product, and its development should primarily be driven by solving its users’ problems. Many of the biggest mistakes I’ve made in developing APIs stem from either not knowing who the users were, not understanding whether the API solved their problems, or not realizing the users didn’t know how to use the API. I’ve found that bringing a product mindset to APIs has helped our users become more successful, and I hope these suggestions are helpful for you too!

Acknowledgments

Thanks to Hugo Lopes Tavares and Andrew Gross for their thorough and thoughtful reviews and suggestions.

[1] According to semantic versioning.

[2] Python does have private variables prefixed with two underscores, which are harder to access outside of the containing class, but are still technically accessible. We tend not to use them and don’t see them used very often in other libraries.