When it comes to performance at scale, there’s very few products that can truly represent the “at scale” part the way that Cloud Bigtable does. To date, Bigtable is the technology behind the majority of Google products, including Gmail, Maps, Youtube; Each of which serves multi-billion active users.
What’s really amazing is that this same technology that powers Google’s most popular products is also available for you to use for your applications. But be warned : You’d think with all that scale and performance that it would be the silver bullet to handle all your backend needs, but when it comes to performance, just like anything else, there’s some things it’s good at, and some things it’s not.
How it works
As a massively oversimplified generalization, Cloud Bigtable can be described as three systems working together:
- A frontend server pool.
- A set of compute nodes to handle connections (aka a “Cloud Bigtable Cluster”).
- A scalable backend of storage.
In more detail, all client requests go through the front-end server pool before they are sent to a Cloud Bigtable node. Each node in the cluster handles a subset of the requests for the entire system, and the frontends will handle balancing these connections depending on the type of action the connection is interested in, and what part of your data it’s wanting to work on.
To store the underlying data for each of your tables, Cloud Bigtable shards the data into multiple tablets (Not a typo! Tablets and tables are different things.), where each tablet contains a contiguous range of rows within the table.
And here’s the important thing when it comes to tablets: they can be reassigned to different nodes in your cluster, on demand, allowing Cloud Bigtable to scale and re-balance seamlessly as your use patterns change.
For example, as reads/writes to a specific tablet increase, Cloud Bigtable might split a tablet into two or more smaller tablets, either to reduce a tablet’s size or to isolate hot rows within an existing tablet. Likewise, if one tablet’s rows are read extremely frequently, Cloud Bigtable might store that tablet on its own node, even though this causes some nodes to store more data than others.
Under these typical workloads, Cloud Bigtable delivers highly predictable performance, and according to the official documentation, you can expect to achieve the following performance for each node in your Cloud Bigtable cluster, depending on which type of storage your cluster uses:
(Standard disclaimer : These performance numbers are guidelines, not hard and fast rules. Per-node performance might vary based on your workload and the typical size of each row in your table. Please see here for latest numbers)
In general, a cluster’s performance increases linearly as you add nodes to the cluster. For example, if you create an SSD cluster with 10 nodes, the cluster can support up to 100,000 QPS for a typical read-only or write-only workload, assuming that each row contains 1 KB of data.
When to/not use it
From here, you can clearly see that Cloud Bigtable does not act like relational databases, and more to the point : while it scales amazingly well, and isolates hotspots gracefully, that doesn’t make it a perfect use for every scenario.
Cloud Bigtable excels at handling very large amounts of data (terabytes or petabytes) over a relatively long period of time (hours or days). This is because Cloud Bigtable needs time to learn your access patterns, and the data needs to be large enough to warrant usage of all nodes in your cluster (otherwise, you might get hotspotting to once cluster).
Which means that Cloud Bigtable is a great solution for:
- Time-series data, such as CPU and memory usage over time for multiple servers.
- Marketing data, such as purchase histories and customer preferences.
- Low Latency Serving, such as adtech (martech) recommendations, etc.
- Financial data, such as transaction histories, stock prices, and currency exchange rates.
- Internet of Things data, such as usage reports from energy meters and home appliances.
- Graph data, such as information about how users/machines/events are connected to one another.
On the other side, if you’re storing a small amount (< 300 GB) of data, or if your Cloud Bigtable interactions last for a very short period of time (seconds rather than minutes or hours), Cloud Bigtable won’t be able to balance your data in a way that gives you good performance, and as such, might not be the right technological solution for your application. Fret not! Google Cloud Platform has a great set of other offerings that might offer you what you need, for example:
- If you need relational joins, like for online transaction processing — use Cloud SQL
- If you’re using Blobs over 10MB — use Cloud Storage
- If you need ACID transactions — use Cloud Datastore or Cloud Spanner
- If you don’t have much data yet — use Cloud Datastore, the Firebase Realtime Database, or Cloud SQL
If your workload falls into the Cloud Bigtable space, then stay tuned to Cloud Performance Atlas, because we’ve got more content on Cloud Bigtable coming soon!