Scaling When Tied to an External API

4 min readFeb 10, 2016

The growing prevalence of APIs has been both a blessing and curse. On the one hand, we can build digital products that are rich with data from all corners of the Internet. We can connect platforms that wouldn’t otherwise work together, create services that analyze different data sources, and populate our databases with authentic seed data; all from information we didn’t collect ourselves. On the other hand, we find ourselves building products that are at the mercy of those APIs and their stability, performance, and changes deemed necessary by their own product and development teams.

Success, even viability, for some products is critically dependent on one or more APIs. Epion Health, for example, provides an app that acts as a 3rd party interface to a major EMR (Electronic Medical Record) system. As a web-based product, nearly every page load of the app requires communication through the EMR’s API. In many cases, there’s no way to circumvent an API request because patient information is being updated in 2 different places at the same time. The patient is submitting personal information through the app while the front desk staff may be updating patient insurance or other information directly into the EMR.

In situations with constant API connectivity and multiple points of simultaneous editing, product teams are presented with three primary categories of challenges: uptime, data integrity, and performance.

Uptime

Problem:
The API goes down.

The API will go down. It’s going to happen, and it’s completely out of our control. Sometimes even scheduled maintenance coincides with our peak traffic.

Solution:
Monitor API uptime, and manage expectations.

From a business point of view, a so-called “disaster plan” should exist around client management and setting expectations. Prepare messaging, email templates, and be sure phone numbers are easily accessible. Prioritize the clients that are largest followed by those who are more “touchy” and difficult to deal with. Build a status page to display general issues that clients can reference.

Giving your team a heads up also helps. A company should know about problems before clients start complaining.

From an engineering perspective, there’s a simple solution: monitor your API. Set up a periodic task that regularly pings each API and sends controlled errors should it not reply. Add status monitors to a dashboard and configure alerts to send emails and/or SMS messages to those who need to be informed when downtime is detected.

Bonus: Save those pings to a database and start building a record of each API’s uptime. This data can be analyzed to identify peak/non-peak times and particular problem areas throughout the day, week, etc. If trends can be identified, contact the API maintainer and notify them. They might be happy to have data-driven insights rather than always being told that “your API is slow.” Also, if the ping table is getting too large, these records can be summarized into a single record that saves the important statistics: total pings, failed pings, and relevant error messages.

Data Integrity

Problem:
Multiple people are editing the same data at the same time.

In a situation like a doctor’s waiting room, while the patient is filling out forms, the front desk staff is also amending the patient’s record with updated insurance information and medical history, there’s bound to be issues with timing. How do you ascertain which information is more “correct”?

Solution:
Provide piecemeal data-driven forms and synchronize frequently.

Perform studies into the typical workflow of the different editing parties, and use that information to play with the timing and ordering of the forms that customers interact with. If there’s a long form, break it up into smaller pieces. Each page refresh provides another opportunity to synchronize data and ensure the app is presenting the most up-to-date information.

A more obvious data-driven solution to avoid is using an API’s “last modified” timestamp because this can present an entirely new set of challenges. For example, given a long form with many data points, chances are that each field will not have a timestamp and this path can end up down a rabbit hole of value comparisons.

Performance

Problem:
The API is generally slow and some calls are significantly slower than others.

Page load and responsiveness are important aspects of a user’s experience. Frustration grows incrementally with each delay as we become less and less patient with technology. All things considered, there are countless articles describing how to “improve site speed” and why it’s so important. Amazon, for example, is notoriously strict on page load times, potentially losing billions due to latency.

Solution:
There isn’t one answer here, but there are some routes to explore.

When the latency of each API request is added to each already optimized and performant app request, it becomes challenging to identify areas where latency can be decreased. Here are a couple ideas:

Caching Strategies

Identify the more static pieces of your API workflow as candidates for caching. Use a caching architecture like Memcached or Redis to store API responses between requests and focus on the API requests pertinent to the next page load. Don’t rely on an API like a database since it’s far too slow and unreliable.

Note: Sometimes caching can inadvertently hide problems deep in a codebase, so be sure underlying performance issues aren’t misidentified or neglected.

Queuing Jobs

A workflow often consists of pushing and pulling data between requests. Consider the data being posted to the API: how timely is this information and is it critical to the next request? If the answer is “ yes” then this won’t help. Otherwise, an avenue to explore is creating job workers that perform tasks out of the normal process flow such as bulk data transfers, file uploads, email generation, etc.

Final Thoughts

Systems will continue to improve their feature sets and performance. In most cases, this is completely out of our control. External APIs can introduce unexpected downtime, slowdowns, and errors into our own systems. We can be prepared to handle exceptions, monitor uptime, and manage expectations when problems occur.