Success in Microservices: The Art of Failing Well

Murat Karakurt
folksdev
Published in
6 min readJan 9, 2024

When we create a system , we don’t know whether the software will be a success or not. There can always be an error. You may think your software doesn’t have any bugs but who knows?

It is quite normal for me to say that “there can be no software without bugs.” Usually the softwares have high test coverage or domain business has automated test and integration test etc. If you look at this way, everything is fine.

When you are talking about microservices, you should consider few potential risks that you can not control for example network connection, load balance problems, external service errors…

It is clear to understand that your microservice will fail at some point. In this case you have to fail succesfully. You may be asking: is it possible to fail succesfully? The answer is definitely YES! Failing successfully means when a failure occurs, there won’t be unavaliable service. To achieve this, let’s discuss some topics.

Seperating Containers

It is important to understand how failures happen in order to know how to prevent them. Let’s think about monolithic applications, common approach is “everything in a single repository”. That means all the software stufs on the same machine for example code, db, cache etc.

At first glance to the below image there is nothing wrong with that. Deployin in a same machine prevents latency, packet loss and complexity.

Everythng is in the same machine

Let’s imagine the scenario that containers begins to fail one by one. It is hard to identify which container is responsible for the fail. Maybe flaw in the cache clearly collapse our api.

After this situation you have to restart the entire application. There is no way to treat them in isolation. Also there can be data lost and inconsistency. With the systematic failure, not only avaliablity of your application is compromised, customers and investors affected.

This is’n even the worst case scenario. If you look at may previous medium article named “Understanding the microservicees concepts”, success keyword everything is for microservices. Whit success comes to need to scale the app. This means you multiply this machine over and over but you don’t improve the resilience.

Scale apps that all stack on the same machine

The seperation of components of a microservice can be done i many ways. Something that I recommend is using Docker to divde the components. In the following diagram you can see better version of containers.

Seperation of containers

Depreciating Data

In the era of cloud computing, every company wants to analyze data. Because of that deleting data sounds absurd. By doing that you can achieve millions of data. This means huge rows brings us to increasingly slow queries.

The question is if you cannot delete data, what should you do so that the queries are not slowing down? Maybe you think about patterns like CQRS, but patterns are not always sufficient when it comes to performance. The answer is depreciating data.

The depreciating of data consist of dividing data and inactive data, and moving inactive data to storage that has no relevance to the real-time application layer.

Regionalizing Data

If our application is perform globally, usually deployment on servers geographically is good practice. The problem arise with regard of the location of database servers.

For example I am in the UK and database servers are in the Austria. This means when I try to fetch data, due to physical distance it will be a latency or possible loss of packets. The solution of this is regionalizing data.

Think about the news app European people are more interested in information about Europe than South Africa. This means when editors publish news about Europe, we must store the data first by region and let the data be subsequently standardized in a process similar to the CQRS. You can even think about it as eventual consistency..

For now, we’ve talked about the problems and solutions, how can we develop application with high availability and resilience. After now, we will see structures that can protect us.

Redundancy

Although we design the system flexible and distributed, still system has not resilience for systematic failures. Redundancy solves this problem tremendously. With redundancy, even if a node of the application is lost, the others can continue responding.

Load balancer with using policy to redirect the request is a good example for microservices. You can create many nodes and even one of them fails the client still get a response. In the below structure, load balancer basically consume the request and directed to the business layer. You have to be careful when working with load balancers because of the version errors.

Load Balancer Implementation

Let’s imagine you are working on an e-commerce platform that sell products. Whenever system under stres you can solve the problem with horizontal scalability. The problem is horizontal scalability is not absolute truth for this type of application. This means any latency on procedure represent unsatisfied customer or etc. For example, in black friday system can observe millions of hits but never sale. You can solve this problem using multiple load balancers. Due to millions of hit the view products, maybe payment part will be blocked. Clearly, the amount of hits will be greater than the number of users who will complete the entire purchasing flow.

Multiple Load Balancer Implementation

Isolation

Another mistake when designing microservice is reuse of components. A calassic example of bad resource sharing is the reuse of a database. “No matter how optimized your balancer is, the threading level of critically to your own application, or how well they have divided their domain, if all or most of your application depends on a single component, the collapse is imminent”.

Look at the below diagram, all the components of the application depends on same physical component. This type of mistake is also common when it comes to the cache and message broker.

Reuse DB

You can view the improved version of the system below.

Circuit Breaker

Microservices involve breaking down an application into small, independent services that communicate with each other over a network. While this architecture offers benefits such as scalability and agility, it introduces challenges related to service dependencies and potential failures.

Circuit breakers in microservices act as a protective barrier, preventing the propagation of faults and failures throughout the system. Inspired by their electrical counterparts, microservices circuit breakers operate based on three key states: closed, open, and half-open.

  1. Closed State: In the normal operation mode, the circuit breaker is in a closed state, allowing requests to flow between microservices without impedance.
  2. Open State: When a microservice experiences a failure or becomes unresponsive, the circuit breaker transitions to an open state. In this state, the circuit breaker prevents further requests from reaching the problematic service, effectively isolating it.
  3. Half-Open State: After a predefined time, the circuit breaker enters a half-open state, allowing a limited number of requests to pass through. This serves as a diagnostic phase to determine if the previously problematic microservice has recovered.

Benefits of Microservices Circuit Breakers:

  • Fault Isolation: Circuit breakers prevent cascading failures by isolating faulty microservices, ensuring that the overall system remains functional.
  • Graceful Degradation: By providing fallback mechanisms, circuit breakers enable microservices to gracefully degrade their functionality when issues arise.

In the dynamic landscape of microservices architecture, circuit breakers play a pivotal role in maintaining system resilience. These mechanisms contribute to the overall stability and reliability of microservices by preventing the spread of failures and allowing for graceful degradation. As developers continue to embrace microservices, understanding and implementing effective circuit breaker strategies become imperative for building robust and fault-tolerant applications.

--

--