Retrying Flaky Data Requests

Published in

NYC Planning Tech

3 min readJun 28, 2019

Sometimes, things go wrong –– RARELY, of course, because test coverage is 100 percent perfect and, like, everything’s gonna be fine, yeah?

But when it does go wrong, it’s good to have sophisticated fallbacks. For example, when an app surpasses a bandwidth limit imposed by one of your 3rd-party services, users should still be able to get their data, if possible, because the issue may not exactly be solved within the next week.

In our case, requests to a rate-limited API threw a helpful error message:

query_timeout_exceeded

That’s explicit data our apps can react to.

But first some quick background: the City of New York’s Zoning and Land Use application, or ZoLa, is an important tool for all New Yorkers because it makes clear to everyone which physical parts of the city are affected by which rules, regulations, and landowners. When requests for this data fail, we’ve failed as a city government. So, let’s fix that.

Because ZoLa was built using EmberJS, a not-exactly-popular but nevertheless excellent framework with a rich addon community, it means we can tap into some amazing software patterns that members of the Ember community have already figured out! One of these is retryable tasks, meaning: messages sent by a program that can be re-sent based on the success status of the response.

Let’s look at some code:

This class — “CartoDataProvider” (“Carto”, meaning the API mentioned earlier) –– exposes a “taskInstance” property to its Handlebars template. This getter wraps a call to the application’s data store with a cancellable “task”, which is the name for a software pattern implemented through a special addon in EmberJS. (For the uninitiated, it’s best to refer to their delightful introduction).

However, cancelling a message sent by the program isn’t quite enough here because it doesn’t know how to handle the error response from our API. Naturally, I tried to build this myself, but that code is just not appropriate for all audiences, so I’ll avoid reprinting.

Thankfully, ember-concurrency-retryable exists, of course, because this problem isn’t such a special snowflake scenario. Using this, I was able to simply add a configuration property to my findRecordTask's decorator, retryable: delayRetryPolicy.

You’ll notice the value of the policy variable is something the maintainers of ember-concurrency-retryable created, a DelayPolicy, which is a special object that describes the behavior of retrying tasks. In our case, we wanted it to wait one second, retry the task, and wait two seconds should that first retry fail before trying again:

const delayRetryPolicy = new DelayPolicy({
  delay: [1000, 2000],
});

This to me is such an expressive API for describing retry behavior, not something that crops up regularly. I think it makes it absolutely transparent what the intention is to future me (or future developers). I do worry that should something go wrong with the retry-handling itself, it will be hard to debug.

If you’re interested in seeing the full source, check it out here. Better yet, if you’re interested in trying out the latest features and improvements for ZoLa, check out our ZoLa Canary version.

Retrying Flaky Data Requests

Written by Matt Gardner