Is your microservice truly HA with databases?

Rohit Yadav
Naukri Engineering
Published in
4 min readFeb 29, 2024

To ensure the continuous availability of any website, it’s imperative to maintain a 100 percent uptime. This demands resilience and high availability (HA) across all layers, incorporating automatic failover, load balancing, and the elimination of single points of failure.

At a broad level, the architecture resembles incoming traffic being handled by an LVS server, then routed to an Nginx layer, and ultimately reaching the application, which in turn depends on a specific database.

In the diagram provided, we observe high availability (HA) implemented at the front layer, at the LVS. Following this, there are multiple Nginx instances configured with load balancing and automatic failover. Subsequently, there are multiple deployments of the application to ensure HA.

The application relies on a particular database for its data source, although there are multiple databases, only one is active while the rest remain in passive mode. In the event of failure or slowdown, manual intervention is necessary to adjust the pointing, and there may be a requirement for load balancing at this layer as well.

So there might be a weak link over database that may lead to system failure, so we need to fortify this layer as well.

Typically, for any infra-related issues, our recourse involves depending on alerting and monitoring systems. Upon discovering that the current MySQL server cannot handle the traffic and experiencing partial failures due to high loads, we switch to pointing to a passive or backup server. In our specific case of Naukri.com, an operating system issue rendered the server unreachable, prompting us to follow the same procedure of pointing to the passive server. Throughout this process of alerting, monitoring, debugging, and switching to a different server, there was approximately a 20-minute period of traffic loss.

Following similar incidents, we realized that even with the high availability (HA) of servers, resilience was not as robust as anticipated. This prompted us to explore various solutions in our quest for improved resilience.

Options:

To address the issue at hand, encompassing efficient automatic failover and load balancing, our primary options include:

- ProxySQL:
— Designed exclusively for MySQL.
— Features advanced functionalities such as query routing and connection pooling.
— Limited adoption due to MySQL specificity.
— Perceived complexity and learning curve.
— Concerns about stability and community support.

- NginxPlus:
— Widely used in web servers.
— Minimal learning curve.
— Strong reputation.
— Non-open source nature may be a drawback.

- HAProxy:
— Open-source load balancer and proxy server.
— Known for reliability, high performance, and flexibility.
— Handles network traffic effectively.
— Straightforward text-based configuration.

In weighing our options, it’s clear that each solution brings its own set of strengths and considerations. ProxySQL’s specialization in MySQL and NginxPlus’s reputation in the web server arena offer compelling features. However, HAProxy shines as a beacon of reliability, performance, and adaptability in managing network traffic. Its open-source nature coupled with straightforward configuration provides a robust foundation for our needs. Therefore, after careful evaluation, we’ve selected HAProxy as our preferred solution for its unwavering reliability and exceptional performance in handling our use case.

Haproxy:

The fundamental architecture of HAProxy involves ensuring high availability (HA) of HAProxy itself through the use of keepalived. The HAProxy configuration file is structured into four key sections: global, defaults, frontend, and backend. These components collaboratively determine the overarching server behavior, set default parameters, and define the reception and routing of client requests to backend servers.


global
# global settings here
log /var/lib/haproxy/dev/log local0
chroot /var/lib/haproxy
maxconn 5000

defaults
# defaults here
mode tcp

frontend
# a frontend that accepts requests from clients
mode tcp
bind *:80

backend
# servers that fulfill the requests
mode tcp
server server1 192.168.1.101:80 check
server server2 192.168.1.102:80 check backup

In the provided configuration, two database servers are defined, each with health checks enabled. Server 2 is designated as a backup server. If server 1 becomes unresponsive, HAProxy will redirect traffic to the backup server 2. However, this is a simplistic illustration, and in actual scenarios, fine-tuning is essential for optimal performance.

So after plugging haproxy, the architecture will be roughly like the below:

As noted previously, fine-tuning is essential prior to implementation, especially considering the diverse strategies required for read and write scenarios. Additionally, establishing effective health checking, connection pooling, and managing connection keep-alive between applications and databases will introduce a new layer of complexity. Furthermore, ensuring the high availability of HAProxy itself is paramount. Stay tuned for our upcoming blog where we’ll delve into these topics, exploring production considerations and best practices, leaving you with a sense of anticipation for what lies ahead.

Thanks !

--

--