Practical Microservices: Retry, Circuit Breaker and Compensation Transaction

Into

Today I want to share a few “responsibility” patterns. The thing is that after splitting a monolith application to services we are facing new challenges. Even if we are still following the same procedure calls (or request-reply) techniques they should be a little overthinking.

Monolith application can guarantee (on a certain level) that all the calls you want to make inside your app will be available at the point of time. When we’re talking about microservices, we’re assuming that each service can experience network issues at any moment of time. It is exact place where new responsibilities are born:

  • who will repeat logic in case of failure? will it be the end client?
  • who will clean up changes in the same transaction scope (saga story) that we’ve done before failure?

Services Call Responsibility

Let’s imagine the monolith case like this:

app.post(“users”, (req, res) => {
  var transaction = Transaction.open();
  try {
    var userId = idIssuer(transaction, req.body.userAlias);

profiles.create(transaction, userId);
    accounts.create(transaction, userId);
    transaction.commit(); //everything is cool, commit
  } catch (err) {
    transaction.rollBack(); //rollback all changes
  }
});

At the end of the call, we will always arrive with the consistent data state state, which is aligned according our expectations. But what if we will split this scenario to services:

  • Users Service
  • User Profile Service
  • User Account Service

In this case, all the calls to services will become RPC (remote procedure calls), or Event triggering. How to manage data consistency in this case?

Sure thing, we don’t want to maintain distributed transaction (price of strong consistency is very high — availability of our services), and eventual consistency is good enough for us. With it we can get closer to consistent state at some moment in time, but that if one of the services in the flow will be non-available of the moment of call? How to manage this case?

Retry Pattern

The first thing that is coming to mind — just simple Retry.

function createProfile(userId) {
  const callback = (res, err) => {

if (err != null && err.code == 500) {

profiles.create(userId, callback); //retry if unavailable
    }
}
  profiles.create(userId, callback); //first call
}

The code above will try to do the logic requested by the client. The good part of it that services will take responsibility for low coupling (e.g. network issues) without delegating it to the client, on another side we can just kill performance of the whole services with infinite retries.

What can we do against it:

  • Do only the limited numbers of repeats (count++ on each retry)
  • Make a timeout after the retry
  • Make an exponential timeout after the retry

The way how you can resolve this issue depends on business requirements.

Compensational Transaction

It’s almost clear that we can repeat the services call in case of unavailability. And not so clear how to behave if

  • we’re doing the sequence of the calls and
  • part of them completed successfully,
  • some of the calls failed even after Retries and
  • we want to rollback previous changes (theoretically we can do it manually, but I’ll prefer to code against this scenario)

In this case, you should implement Compensational Transaction logic, something like this:

profiles.create(userId, (profiles_res, profiles_err) => {
    if (profiles_err == null) {
      accounts.create(userId, (accounts_res, accounts_err) => {
         if (accounts_err != null) {
            profiles.delete(userId); //time to rollback first call
}
}
}
});

Hope the principles above will help you to create more robust services and applications.

More details about the possible retry and compensation techniques you can find in the following articles: