How to build a rock solid app

An overview of different app design options

Simone Di Maulo
Sep 3, 2018 · 6 min read

When we design software, we constantly think about error cases. Errors have a huge impact on the way we design and architecture a solution. So much so, in fact, that there is a philosophy known as Let It Crash.

Let it crash is the Erlang way to treat failures by just letting the application crash and allowing a supervisor to restart the crashed process from a clean state.

Supervisors restart the crashed process

Errors could be everywhere, and the more your application grows, the more there will be points of failure that you need to keep under control. External service calls, sending email, database queries are all operations that could fail.

Kinds of Failures

These kinds of errors are called Transient Errors, which means that the database server is temporary overloaded but it’s going to come back soon.

Transient errors are not related to any problem in the application. They are usually caused by external conditions such as network failures, overloaded servers, or service rate limits. For that reason, it’s safe for a client to ignore it and retry the failed operation after a while.

These errors are much more frequent within cloud native applications, because the apps are split into different services and deployed on different servers that communicate over the network.

Identifying Transient Errors

Treating the Errors

Even though this solution might be fine in many cases, there are lot of cases when it can lead to a performance decrease for the app.

Let’s take the case of a network failure. Indefinitely retrying some API calls to a disconnected service would result in continuous network timeouts, and the application will be stuck waiting for a response for a very long time.

Before going ahead with complex implementations, let’s evaluate the pros and cons of the “just-retry” option.

PROS

  • Trivial implementation.
  • Stateless (every retry request is isolated and you don’t need any extra information).

CONS

  • For heavily loaded applications, the caller will continuously send requests to the degraded server resulting in a denial of service.
  • Cannot provide a response until the server comes back.

This simple retry strategy can be considered as a very first approach to solving the issue. For low traffic apps it would work, but if you have a more complex architecture, it’s definitely not enough.

So let’s discuss a more resilient approach.

Stealing an Idea from the IEEE

The concept of the exponential backoff directly comes from the Ethernet network protocol (IEEE 802.3) where it’s used for packet collision resolution.

For our purposes, the exponential backoff can be used to avoid wasting time between timed out calls or to avoid hammering an overloaded server with an continual flow of requests that cannot be resolved.

Binary exponential backoff for packet collisions can be resumed with help from the follow definition:

After *c* collisions, a random number of slot times between 0 and 2*c* - 1 is chosen. For the first collision, each sender will wait 0 or 1 slot times. After the second collision, the senders will wait anywhere from 0 to 3 slot times. After the third collision, the senders will wait anywhere from 0 to 7 slot times (inclusive), and so forth. As the number of retransmission attempts increases, the number of possibilities for delay increases exponentially - Exponential backoff - Wikipedia

This algorithm can be quickly adapted to many use cases. The following example is a PHP message handler class that exponentially waits for a response from an API endpoint.

Retry vs Exponential Backoff

We may be lucky and receive a response after a couple of retries, or we could fall in the retry-wait-retry-wait… infinite loop and never receive the response.
You know, Murphy’s law is always here: “Anything that can go wrong will go wrong.”

As you might imagine, scaling a service oriented infrastructure that in case of failure continuously retries the request to the dependant services is the perfect recipe for application collapse.

We need a stronger strategy to maintain infrastructure resilience.

Electronics may Help Us

Source: https://pixabay.com/en/circuit-breakers-rcds-fault-current-1167327/

In case of continuous errors, the easy thing to do is clear. We do not want to loop and retry calling an external service. The point is we’ll just stop doing it, by taking the concept of Circuit Breakers from electronics.

From Electronics to Computer Science

So the circuit breaker can autonomously control the service status and decide to open or close the circuit, so that in case of disconnection or server overload, the client stops sending new connections and the degraded service can use more resources to come back to a healthy state.

In case of an open circuit, we could decide to quickly answer to the client with a fallback response. For example, cached data, default data, or whatever make sense for the particular application.

Let’s see a real example from the e-commerce world. We’re going to use the circuit breaker method to protect the product listing API call.

The circuit breaker will transparently handle all errors and show the default response in case of an API call failure. It also allows defining a max number of retries to avoid too many failed calls.

In this case, protecting a third party service API call is a very simple task: we just need to provide the callback and number of max failures allowed, after which the circuit breaker will be opened for 10 seconds and the default response is given back to the client, as in the example below.

Conclusion

Here some of the well known tactics to build a real rock solid app:

freeCodeCamp.org

This is no longer updated. Go to https://freecodecamp.org/news instead

Simone Di Maulo

Written by

Software Engineer @ Hootsuite - Opinions are my own

freeCodeCamp.org

This is no longer updated. Go to https://freecodecamp.org/news instead

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade