API Testing & Monitoring at FanAI

Karim Varela
FanAI Engineering
Published in
4 min readFeb 5, 2020

This is the story of how we’ve stood up API testing and monitoring at FanAI. We don’t yet have a dedicated test engineer so the responsibility falls on myself as CTO, the backend team as developers of the API, and the frontend and data teams as consumers of the API. And we are still in our testing and monitoring infancy so I’m sure our approach will mature, but I’d love to share what we’ve done so far.

Here’s what we’ll cover:

  1. Unit and functional tests
  2. Postman
  3. Stackdriver
  4. Beyond

Unit and functional tests are table stakes

Of course all your developers should always be writing unit, integration, and/or functional tests to go along with all APIs they build. That’s par for the course. And they should be running on every pull request so we can ensure we’re not merging in code that doesn’t pass tests. Our backend is almost all Python so we use Pytest for this and shoot for at least 80% code coverage.

Postman is a great tool for E2E tests

Postman is great because it allows you to define, save, and share requests with your team. In this way, you can easily build up a test suite that can be updated and maintained by your test team (or in our case myself, the frontend team, and the data team).

Postman also allows you to inspect the contents of the response (using javascript) to check for the existence of certain fields or to check the response code, among other things.

Basic Postman API tests

This works great for manual testing, but you can also create monitors for your APIs that you can run as frequently as every 5 minutes.

Setting up a Postman monitor

The more you run them, the more it will cost you, however, so we currently run our API tests every hour as our uptime requirements are not super stringent at the moment and we have other uptime checks as I’ll describe below in the Stackdriver section. We also integrate Postman with Pagerduty so we are alerted of any test failures.

You can also write tests in Postman for the E2E latency of your APIs. This sounds great at first, but proved to be troublesome.

Setting a latency threshold in Postman

The problem was that if the latency test failed on just one request in a Postman collection, it would cause the entire monitor to fail and since our monitors are hooked up to Pagerduty, we would get paged.

This might be acceptable if our APIs were always consistently fast, but these are E2E tests so they are by nature unpredictable as we’re at the mercy of the network and who knows how many hops in between Postman’s servers and ours.

On top of that, we rely heavily on caching in order to return data quickly to our users. It’s just not possible to do some of the massive calculations we need to do in the context of a typical web request. Unfortunately, caches expire and if our automated test happens to query for some data which was cached, but then went out of cache, it manifests as a latency spike and some poor backend engineer gets woken up rudely by Pagerduty in the middle of night.

So, we’re disabling our Postman latency tests in favor of Stackdriver …

Stackdriver provides the monitoring granularity we need

We already had some basic uptime checks for our web properties in Stackdriver. Stackdriver also has the capability to check latency on GET requests.

What makes it better than Postman is its ability to measure latency for N minutes in a row and only alert to Pagerduty if X out of N latency tests fail. Similar to Postman, it also measures latency from multiple locations around the world, which is cool

Stackdriver policy details configuration for FanAI custom latency alert. We are alerting to PagerDuty if this API takes longer than 1000ms for 2 minutes in a row.

What’s next?

We’re in the process of polishing off uptime and latency monitors for all of our APIs. The next phase of API testing is to test the actual content of the response to ensure it meets consistent formatting standards for all our services and that individual APIs are returning the right fields with the right data.

--

--