Single Point of Failure

InterviewReady
3 min readJul 26, 2022

By — Avash Mitra

What is a Single Point of Failure?

A Single Point of Failure is a point where the entire system can crash in case that point crashes.

For example, if we have only one database in a system and it crashes then our whole system will stop functioning. Even if other services are working, it won’t matter.

Identifying a Single Point of Failure in a system

If we have all the components connected to a single component then the single component will become a single point of failure

Suppose we have only one gateway node. It receives all the external traffic and all internal services send responses through it. This is a single point of failure.

Avoiding Single Point of Failure

  • Adding more nodes

If the gateway service is a single point of failure then we can add another node. If the first one crashes we can use the second one. Or we can distribute load across both the nodes.

  • Adding Load Balancers

When we are using multiple nodes for a service we also need to distribute load across the nodes. For that, we use the load balancer. Since a single load balancer is a Single Point of Failure, we will use multiple load balancers.

When we are using multiple load balancers, we also need to use DNS (Domain Name Servers) so that clients know which load balancer to connect to. (DNS maps the IP Address of the load balancer to the domain name.) (Also the DNS itself is a distributed system)

  • Using Master-Follower architecture for databases

For a service, it does not make sense to keep a backup service but we can keep backup databases. If the original database crashes we can use the backup. The database that gets the write requests is the master. And the replicas are known as followers. We can read from the followers.

  • Hosting your services in multiple regions

Even if your entire system in one region is affected, users can be redirected to the systems in other regions.

For example, you can host one set of services in the USA and the other one in Europe. This makes the system more resilient because the probability of both systems crashing becomes very low.

That’s it for now!

You can check out more designs on our video course at InterviewReady.

--

--

InterviewReady

Simplifying interview preparation for software engineers