Improving Bigtable Availability with Replication

Loh Jia Sin
Google Cloud - Community
9 min readAug 29, 2023

In the realm of application performance, even rare occurrences can have a significant impact on user experience. Recently, I encountered a case highlighting this point — a situation where an application faced occasional high read latencies while retrieving data from its Bigtable instance. Although these incidents were infrequent (around three times in six months), the application’s sensitivity to latency made each occurrence disruptive, affecting how users interacted with the platform.

Understanding the Issue

So, what caused these sporadic latency spikes? These were primarily triggered by unexpected hardware or software failures leading to temporary node outages within the Bigtable cluster. In the realm of hyperscale infrastructure provided by cloud providers such as Google Cloud, such failures are rare but inevitable.

When these rare occurrences happen, the Bigtable cluster needs to quickly rebalance the work across the remaining nodes and coupled with the ongoing recovery of the nodes, there is a chance that these could cause CPU utilisation spikes, which might result in the cluster becoming unresponsive or lead to higher latencies for the incoming read or write requests.

Finding a Solution: Bigtable Cluster Replication

In this post today, we will try to find a practical solution to tackle these latency issues and improve application reliability. We will also discuss some of the key considerations when implementing this solution.

At a high level, our solution will involve the replication of data to an alternate cluster using Bigtable Cluster Replication and creating a routing setup which can also automatically failover requests when they are encountering high latencies.

Replication

Initially, our application is reading and writing data off a single Bigtable cluster (multi-node single cluster deployment) without any form of replication enabled.

To minimise the costs increase, we will implement regional replication whereby data will be replicated to a secondary zone within a single region. If we implement multi-regional replication, it will increase availability beyond 1 region but there will be additional cross-region network egress costs charged for network traffic and replication traffic.

Currently, my single cluster (js-demo-cluster-1) is in asia-southeast1-b zone. To enable replication to another zone, we will add another cluster (js-demo-cluster-2) to this Bigtable instance and house it in asia-southeast1-a zone. Once the cluster is added, data is automatically copied from the original cluster to the new cluster, and data will always be actively synchronised between the 2 clusters.

Once the replication is completed, you can see that our instance now has

  • 2 clusters in 2 zones (asia-southeast1-a, asia-southeast1-b)
  • 1 node in each of the 2 clusters (The nodes column represent the number of nodes allocated to clusters in this instance)

Something to note here is that each cluster can then be treated as a primary cluster i.e. we can read/write to any of them. However, do note that a write request will only be written once on a single cluster with strong consistency, and these updates will automatically be replicated to the other cluster with eventual consistency.

  • Replication Latency and Performance

For our case, with both clusters in the same region, we can expect lower replication latency, so data should replicate within milliseconds, and in cases of busier clusters, data should replicate within seconds.

In terms of performance, some CPU resources will always be utilised by the destination cluster to pull the changes from the source cluster. Although user traffic activities are always prioritised over replication activities, there might still be performance impact to the cluster if the overall cluster CPU becomes too high. So, there is a need to factor in some additional CPU resources for the background replication.

Next, we will decide how our requests will be routed by configuring the application profile of our Bigtable instance

1. Application Profile: Multi-Cluster Routing

Using the default application profile, the default setting is always set to Single-Cluster routing, which means that all our incoming requests to the Bigtable instance are always routed to one cluster (in our case, it will be the original cluster, js-demo-cluster-1).

To achieve improved availability, we will change the cluster routing configuration to Multi-Cluster Routing. Once this is done, any incoming request will be routed to either cluster (depending on which is the nearest available cluster).

With Multi-Cluster Routing, any incoming request can be routed to either cluster

As the write request could be going to any of the 2 clusters, either cluster could be the destination cluster pulling in the replication changes from the other cluster.

With this change, there are also some key considerations to note:

  • Improvement of Performance

For read requests, throughput will improve as there are now 2 clusters available to serve these requests. Read latency could also be reduced because the request will always be automatically routed to the cluster closer to the request.

However, for write requests, we will not see an improvement in throughput as the request will still be written once to a single cluster, before it is replicated to the other cluster.

  • Improvement of Availability

In our scenario whereby Bigtable routes a request to our Cluster 2 in Zone A, and that cluster is excessively slow to reply or returns a transient error, Bigtable will automatically send that request to Cluster 1 in Zone B.

Multi-Cluster Routing behaviour
  • Setting a deadline threshold

How do we then define what is excessively slow? Automatic request failovers occur when the request takes longer than half the time set within the deadline threshold, or a default max of 2s if unset. When requests hit this threshold, Cloud Bigtable will re-route the request to the next nearest cluster from the source location.

In my code sample below, we have defined a 50ms deadline threshold for my mutate request. If the request latency exceeds 25ms, the request will then be routed to the other cluster.

deadline = time.time() + 5e-2  # 50 ms
table.mutate_rows(rows, timeout=deadline)

In our current case whereby we have a latency-sensitive application, it is important to have a properly set request deadline, so that we can ensure that requests can be failed over at the appropriate time. Although having a tighter deadline results in more failovers occurring, this will help keep the request latencies lower, in the event of temporary unavailability or high latencies encountered.

  • Conflict Resolution

What if you have 2 writes to the exact same four-tuple (row key, column family, column qualifier, timestamp) going to 2 different clusters? Which write will be respected during the replication?

Bigtable automatically resolves the conflict using an internal last write wins algorithm based on the server-side time.

  • Eventual consistency

With Multi-Cluster routing, it also raises another question: Could my read request be going to Cluster 2 while my write request is writing to Cluster 1 at the same time? Given that replication does not happen real-time, could my read request then be reading outdated data?

Yes, given though data typically replicates within milliseconds (and in some cases, seconds), there is a chance that our read request might be reading outdated data, as replication is eventually consistent when we adopt a multi-cluster routing strategy. This means that when we write a change to one cluster, we will eventually be able to read that change from the other clusters, but only after the change is replicated among the clusters.

Hence, with Multi-Cluster Routing, we are trading off our read-your-writes consistency for improved performance and availability.

2. Application Profile: Separate profiles for Read and Write Traffic

An additional option that we can consider is to have separate application profiles for different types of traffic.

In this option, we can choose to design our application profiles such that:

  • All write requests are routed to a Single-Cluster Routing application profile, which ensures that all requests served will constantly write to only 1 cluster.
  • All read requests are routed to a Multi-Cluster Routing application profile, which then routes the request to read from either cluster.

For this case, we can isolate the write workloads on Cluster 2 and ensure that Cluster 1 is provisioned with additional CPU resources to pull in all the replication changes.

This ensures that we can keep our latencies for read requests low, as requests hitting high latency (beyond the configured deadline) will always be failed over to the other cluster.

It also simplifies our routing set-up and reduces the chances of our read traffic reading outdated data (as compared to the first option). Reading outdated data only happens during the few milliseconds when read traffic is reading from Cluster 1 and the written change in Cluster 2 is yet to be replicated to Cluster 1.

3. Application Profile: Single-Cluster Routing

However, if preserving Read-Your-Writes consistency is critical to our application, are there other approaches that we can consider to improve availability without compromising consistency?

For this, we can change the cluster routing setting on our application profile to Single-Cluster Routing again. In this case, all requests (read/write) will only go to Cluster 2, and Cluster 1 will always pull all data changes from Cluster 2 to remain synchronised. This will preserve our read-your-writes consistency, ensuring that all our read requests are reading from Cluster A where the writes are committed.

This also means that when Cluster 2 is unavailable, all requests going to Cluster 2 will fail. There is a need to manually re-route our app profile, so that the requests can be re-routed to Cluster 1.

Instead of doing this manually, a possible solution to automate this could be as follows:

a) We have an alerting policy in Cloud Monitoring, which is actively monitoring a particular Bigtable metric (e.g. Server Error Count, Server Latencies, etc).

b) Once the metric threshold is exceeded, the alerting policy sends an alert notification to a Pub/Sub topic.

c) Once the Pub/Sub topic receives a notification, it triggers a Cloud Function, which updates the app profile of the Bigtable Instance to route all requests to the other cluster instead.

However, there are a few cons to this approach as well:

(1) The replicated cluster is not being utilised whenever there are no issues with the primary cluster.

(2) As Cloud Monitoring monitors a metric with a sampling period of at least one minute, this approach might take up to a minute before the update of app profile is triggered to failover the requests.

Conclusion

For Bigtable, it is always important to design the right application profiles to determine how our requests are being routed to different clusters. This will help us to achieve the desired balance between these three factors.

  • Performance: If performance is our top priority, we may want to use multi-cluster routing. This will ensure that our requests are routed to the closest cluster, which can reduce latency. However, multi-cluster routing can impact consistency, as it may take some time for the data to be replicated to all clusters.
  • Consistency: If consistency is our top priority, we may want to use single-cluster routing. This will ensure that all of our reads return the most recent data, regardless of which cluster the data is stored on. However, single-cluster routing can impact performance, as all of our requests will be routed to a single cluster.
  • Reliability: If reliability is our top priority, we may want to use multi-cluster routing with a replication factor of 3 or higher. This will ensure that our data is stored on multiple clusters, so that it will still be available even if one cluster becomes unavailable. However, this can impact performance and consistency, as it takes more time to replicate data to multiple clusters.

By carefully designing our application profiles, we can take advantage of the benefits of Bigtable replication to improve the performance, consistency, and reliability of our applications.

--

--

Loh Jia Sin
Google Cloud - Community

Google Cloud Technical Account Manager based in Singapore. The content shared here are solely my own and do not necessarily reflect position of my employer