Microservices Resilience and Fault Tolerance using AWS JDBC driver

Kangsheng Wong
Government Digital Products, Singapore
3 min readOct 25, 2023

Developing reactive and resilient microservices entails creating a highly scalable and distributed architecture. To meet the demands of resiliency and fault tolerance, it is crucial to address the following aspects of failure management:

  • Proactively Avoiding Faults
  • Effectively Isolating Faults
  • Swiftly Detecting Faults
  • Ensuring Efficient Recovery

Typical modern microservices architecture

In many scenarios, the architectural setup comprises an application layer and a data layer. The application microservices are typically deployed within a Kubernetes environment, while the data tier is managed using services such as Amazon RDS.

Typical 2-tier architecture

With applications hosted in K8s, it simplifies the deployment, scaling, and management of containerized applications. Hosting a database cluster in the cloud via Aurora is able to provide users with sets of features and configurations to obtain maximum performance and availability, such as database failover.

Existing JDBC driver DO NOT currently SUPPORT DATABASE FAILOVER

However, at the moment, most existing drivers (JDBC driver) do not currently support those functionalities or are not able to entirely take advantage of it. The application software developer has to create custom logic to handle such cases.

What is database failover?

Failover is a mechanism by which Aurora automatically repairs the cluster status when a primary DB instance becomes unavailable. It achieves this goal by electing an Aurora Replica to become the new primary DB instance, so that the DB cluster can provide maximum availability to a primary read-write DB instance.

AWS Advanced JDBC Driver

To address the fault-tolerant for microservices on database failure, we can introduce the usage of AWS advanced JDBC driver.

source: https://github.com/awslabs/aws-advanced-jdbc-wrapper/blob/main/docs/images/failover_diagram.png

The diagram above simplifies how the AWS Advanced JDBC Driver manages Aurora database failovers. It begins with an application using the driver to establish a logical connection to an Aurora database via the JDBC Driver, connected to the primary DB instance (DB instance C) when using the Aurora DB cluster endpoint.

If DB instance C experiences a failure, Aurora will initiate failover to promote a new primary DB instance. At the same time, the JDBC Driver will intercept the related communication exception and kick off its own internal failover process.

At this point, the JDBC Driver will switch over the connection to the new primary DB instance and return control to the application by raising a FailoverSuccessSQLException. Although the DNS endpoint for the DB cluster might not yet resolve the new primary DB instance, the JDBC Driver has already discovered this new DB instance during its failover process and will be directly connected to it when the application continues executing statements. Hence, it provides a faster way to reconnect to a newly promoted DB instance, thus increasing the availability of the DB cluster.

Key Takeaway

Kubernetes and AWS RDS can provide the foundational building blocks for a resilient and fault-tolerant system, but the effective use and configuration of these tools are in the hands of the software developers and DevOps teams. Careful planning, monitoring, and continuous improvement are crucial for achieving high levels of resiliency in a production environment.

Reference

Credit

Thanks to my teammate, Jiajun(https://github.com/beginner349) for assisting in implementing this for the NPHC project.

--

--