When you’re dealing with something as big and powerful as Cloud Bigtable, it’s important to make sure you’re looking at the right data and tests to ensure performance is awesome. As we talked about before, most of the magic of how Cloud Bigtable handles such scale is all behind the scenes; where frontends can coordinate, balance, and optimize your access patterns by moving tablets of data between various compute nodes as needed.
Because so much of the heavy lifting of Cloud Bigtable performance happens behind the scenes, it might not be obvious how to figure out whether an instance is performing well. Are you supposed to set a breakpoint in the middle of Cloud Bigtable’s tablet sharding or something? Do you even want to? (Spoiler alert: You don’t.)
Thankfully, Cloud Bigtable gives you a much easier way to see how your instances are performing over time.
Thankfully, this is where Cloud Bigtable Monitoring can help out.
The monitoring UI
Cloud Bigtable has a built in monitoring tool that automatically reports how your nodes, clusters, and instances are performing. After your instance is created, you can click the monitoring tab:
Which will show you things like CPU utilization, error rate, storage utilization, rows read/written read/write requests and read/write throughput; all in a handy graph form so you can see how these values have responded over a historic time series.
Now, having all this data is great, but what’s really important is finding the situations where it actually tells you something. Here’s a few common use cases that are worth keeping an eye on
Is it time to add more nodes?
As mentioned before, Cloud Bigtable works by keeping a set of front-end nodes which are responsible for routing and balancing traffic to the backend tables. When you create your Cloud Bigtable instances and clusters, you specify the number of nodes you want, and using the monitoring tool, you can quickly see if it’s time to add more nodes.
On our instances tab, we can take a look at the CPU utilization table, which has a “recommended max” line. This line represents the best practices for CPU utilization, and if you eclipse this line for a number of minutes, it’s generally a good idea to add more nodes to your cluster.
In addition, once the CPU utilization gets high enough, the nodes become bottlenecked, and we can see, as a result, that both the read throughput and wrote throughput suffer as a result.
On the non-cpu side, the “Disk Load” graph could also point to a need to add more nodes. If this value is frequently at 100%, you might experience increased latency. Add nodes to the cluster to reduce the disk load percentage.
How balanced are my reads/writes?
You might see another common performance problem with Cloud Bigtable if reads and writes aren’t evenly distributed across all of your data.
Despite designing your schema correctly, there are some cases where you can’t avoid accessing certain rows more frequently than others. Cloud Bigtable helps you deal with these cases by taking reads and writes into account when it balances tablets across nodes.
For example, suppose that 25% of reads are going to a small number of tablets within a cluster, and reads are spread evenly across all other tablets:
Cloud Bigtable will redistribute the existing tablets so that reads are spread as evenly as possible across the entire cluster:
However, there are times when Cloud Bigtable can’t properly spread out the read/write volume across all nodes, and as a result, performance will suffer.
We can detect this instance in the monitoring tool by keeping an eye on the “CPU Utilization of the hottest node”. If the hottest node is frequently above the recommended value, even when your average CPU utilization is reasonable, you might be accessing a small part of your data much more frequently than the rest of your data.
To fix this, you’ll need to go back and check your schema design, and your access patterns and make sure it supports a more even distribution of reads / writes across each table.
Digging deeper with Key Visualizer
The monitoring UI is a great way to figure out high level issues with your cluster, and see if there’s things you can address without too much heavy lifting. Often times though, you really need to dig into the specifics of your performance, and for that the Key Visualizer for Cloud Bigtable is exactly the tool for you, as it can provide insights into usage patterns at scale that are difficult to understand otherwise. For example:
- Check whether your reads or writes are creating hotspots on specific rows
- Find rows that contain too much data
- Look at whether your access patterns are balanced across all of the rows in a table
For an example from the official documentation, this tool can help you find specific usage patterns, visually, so you can quickly track them down:
There’s more to come!
Now, it’s worth noting that the values I’ve discussed above are for instances that don’t use replication. If you’re using replication, you should check out the official documentation to keep up-to-date with the best practices and performance thresholds.
If you’d rather monitor your Cloud Bigtable instances programmatically, we’ve got tools for that, too. You can use Stackdriver Monitoring to keep an eye on usage metrics, and Stackdriver Logging lets you audit access to your instances. If you’re using our HBase-compatible client library for Java, you can even get client-side performance metrics.
In upcoming posts we’ll be taking a look at how to properly design your schema, update hotspots, and design your data so you can run as fast as possible. Stay tuned!