How does BigTable work/Best practices:
Cloud Bigtable can be described as three systems working together.
- A front-end server pool.
- A set of compute nodes to handle connections (Cloud Bigtable Cluster).
- A scalable backend of storage.
All client requests go through the front-end server pool before they are sent to a Cloud Bigtable node. Each node in the cluster handles a subset of the recommendations for the entire system, and the frontends will handle balancing these connections depending on the type of action the contact is interested in, and what part of your data it wants to work on. To store the underlying data for each of your tables, Cloud Bigtable shards the data into multiple tablets where each tablet contains a contiguous range of rows within the table.
Expected performance:
Cloud Bigtable delivers highly predictable performance, and according to the official documentation, you can expect to achieve the following performance for each node in your Cloud Bigtable cluster, depending on which type of storage your cluster uses.
Bigtable’s performance could decrease due to several factors:
- Reading many row keys or ranges in a single read request.
- The Bigtable rows contain large amounts of data with many cells.
- Improper design of table schema.
- Network connection issues.
- Using an out-of-date client library for the application.
- Insufficient nodes in the cluster.
Bigtable capacity:
Performance scales linearly as more clusters are added to a cluster. The capacity of Bigtable clusters should be determined according to the trade-off between throughput and latency and the trade-off between storage and performance. When the CPU load of a cluster is below 70%, Bigtable offers optimal latency. Planning at least twice the capacity for an application’s maximum queries per second is recommended. This allows a cluster to run at less than 50% of the CPU load, offering low latency to front-end services.
Bigtable optimizes the storage by distributing data across all the cluster nodes as the data volume increases. The storage usage per node is calculated by dividing the cluster’s storage utilization by the number of nodes in the cluster. Storage Utilisation for latency-sensitive applications is set to 60% or below. Workloads experience increased query processing latency even if the nodes meet overall CPU needs. More background work is required as the storage per node increases. This can result in low throughput and high latency.
Replication:
Cloud Bigtable replicates data across multiple regions and zones to increase availability and durability. This ensures that serving applications are isolated from batch reads, and that data is stored globally. Replication increases read throughput but do not affect write throughput. Without replication, the write throughput for an instance doubles when the number of nodes in a cluster increases. However, with replication, each piece of data is written twice. First is when the write is received and then during replication to the other cluster.
Monitoring Bigtable:
Monitoring gives a detailed view of the usage of Bigtable. This can be done from the following places in the console.
- Bigtable cluster overview
- Bigtable table overview
- Key Visualizer
- Bigtable monitoring
- Bigtable instance overview
- Google Cloud’s operations suite
- Cloud Monitoring
It can also be monitored using the Cloud Monitoring API. Users can monitor performance over time by breaking down metrics for various resources and using charts for a given period. Cloud Monitoring can import usage metrics from Bigtable. Usage metrics can be viewed in the Metrics Explorer on the Resources page under Monitoring.
CPU Utilisation and Disk Usage:
Nodes in a cluster perform various operations like reading, writing, and other administrative tasks, all of which require CPU resources. The metrics for CPU utilisations are:
- Average CPU Utilization of all the cluster nodes.
- CPU Utilisation of the hottest node: The hottest node is the busiest node that changes states frequently.
- CPU Utilisations by methods, table, and app profile.
- Bigtable measures disk usage in binary units like Binary Gigabytes, also known as gibibytes (GiB). Storage metrics calculate data in the disk as of the last computation.
Key Visualiser:
Monitoring the usage patterns of Bigtable can be done using Key Visualiser. It is a tool that helps users analyze and diagnose Bigtable. The visual reports generated by Key Visualiser give detailed insights into usage patterns that may be difficult to analyze otherwise. They can be used to improvise the existing schema designs and troubleshoot performance issues. Key Visualiser does not display all metrics responsible for the performance of Bigtable. Hence, additional troubleshooting along with Key Visualiser scans are needed to identify the causes of performance issues. Key Visualiser scans consist of a heatmap with aggregate values on each axis. Heatmaps show the patterns of a group of keys over time. The x-axis represents time, and the y-axis represents the row keys. Low metric values for a row key are said to be cold and are denoted in dark colors. High values appear as light colors. Such visual patterns make it easy to diagnose problems with just a glance.
The given picture denotes a heat map in Key Visualiser. Issues identified by the Key Visualiser are displayed above the heatmap as diagnostic messages. The usage of a particular resource determines high and low values. If warning and performance metrics appear in bright colors, the Key Visualiser detects a potential problem. Colors in the heatmap can be adjusted using the +/- buttons on either side of the Adjust Brightness option. Increasing the brightness decreases the range of values represented by that color and vice versa. Users can use Rectangular Zoom to enlarge a particular area in the heatmap to get a closer and more detailed look. This helps notice issues for a specific period. Row keys represent a hierarchy of values, each having an identifier to capture usage and a timestamp. Users can drill down into the data of a heatmap using a common prefix shared by a group of row keys. Specific row-key hierarchies can be selected from the left side of the heatmap. The key prefix for all the row keys at that level is also displayed. Details about a metric are shown as a tooltip when the cursor moves over the heatmap. Tooltips can be pinned by clicking on the heatmap. The ops metric gives an overview of the usage pattern for a table. Users can switch metrics by choosing one from the Metric drop-down list above the heatmap.
Heatmap Patterns:
Five main heatmap patterns are frequently spotted.
- The pattern denotes sequential reads and writes. The diagonal line implies the access of contiguous key ranges in sequential order.
- The pattern represents evenly distributed reads and writes. The fine-grained texture shows an effective usage pattern.
- Alternating bands of dark and light colors show that the key ranges are accessed only at specific periods and not always.
- An abrupt change from dark to light color denotes a sudden increase in adding or accessing rows in a specific period.
- Horizontal lines of light and dark colors can represent hotkey ranges, usually while performing larger reads and writes.
Diagnostics:
Diagnostic messages help identify issues in performance data while observing a Key Visualiser scan. The messages can include a Warning symbol or Danger symbol to denote the problem-causing rows. Following are some of the diagnostic messages.
- High read pressure
- High write pressure
- Larger rows — It notifies that some rows in the table exceed 256MB of data.
- No data scanned — This implies no performance data for the table.
- Keyspace not to scale — If a table contains a small number of rows, the Key Visualizer cannot evenly distribute the row keys into buckets.
Plan your Bigtable capacity:
- High throughput and low latency: Bigtable offers optimal latency when the CPU load for a cluster is under 70%. For latency-sensitive applications, however, we recommend that you plan at least 2x capacity for your application’s max Bigtable queries per second (QPS). This capacity ensures that your Bigtable cluster runs at less than 50% CPU load, so it can offer low latency to front-end services. This capacity also provides a buffer for traffic spikes or key-access hotspots, which can cause imbalanced traffic among nodes in the cluster.
- Storage usage and performance: When storage utilization increases, workloads can experience an increase in query processing latency even if the cluster has enough nodes to meet overall CPU needs. This is because the higher the storage per node, the more background work such as indexing is required. The increase in background work to handle more storage can result in higher latency and lower throughput.
- For latency-sensitive applications: Google recommends that you keep storage utilization per node below 60%. If your dataset grows, add more nodes to maintain low latency. For further information on storage please check the official documentation.
- Capacity planning: Always run your own typical workloads against a Bigtable cluster when doing capacity planning, so you can figure out the best resource allocation for your applications. You can follow the PerfKitBenchmarker tutorial for Bigtable to create tests for your own workloads.