Explore your Bigtable data with the new Bigtable Studio query builder
I’ve used Bigtable, the NoSQL wide column store for many years in my role as a developer advocate at Google Cloud. Until now, when I need to debug an application or new data pipeline, my options had previously been limited to using a command line tool or writing code to query the database. These options can be cumbersome and can can slow down even the most experienced users, let alone folks who are new to the product. Fortunately, there’s a new solution, the Bigtable Studio query builder, which launched earlier this month and provides a simple, interactive way to review or explore Bigtable data.
Bigtable Studio query builder can be valuable regardless of your experience level and fits in nicely with your developer workflow. In this blog, we’ll look at using the query builder to view data, manage access to your data, and troubleshoot your applications.
Viewing data
I am so happy to finally see my data in the Cloud Console alongside my instances and tables. To navigate to this page, I selected my instance from the Bigtable homepage then clicked the new option for Bigtable Studio in the sidebar.
In the studio, you’ll select the table you’d like to query and then you’ll see the data including row keys, columns and cell values. You’re also able to select or hide columns and view cell timestamps.
Bigtable is an open sandbox for development, so developing a mental model of your data quickly is important. If you have hundreds of tables or new developers joining your team, now you can easily answer common questions: what do the row keys look like? What columns are there? What does the data look like? Plus, you can get this information without needing to jump into a command line or write code.
Data access
Bigtable is a petabyte-scale database often used by large organizations, so most developers choose to restrict access to it minimize risk to production. However, this can be limiting to data scientists, business analysts and anyone who could help extract more value from the data. It’s dangerously easy for someone used to SQL databases to query data in a way that will perform multiple table scans causing hotspotting, latency issues or hogging CPU.
They can explore the list of tables and their column families and then click through to see the data on any tables that pique their interest. And for those worried about production performance, the limitations on the query builder prevent unintentionally large queries which could consume excessive CPU and affect overall system performance.
Troubleshooting
Bigtable provides an arsenal of tools for troubleshooting and debugging your application: key visualizer, monitoring tools, client metrics, and more. So let’s see where the Bigtable studio fits.
Query development
Filters that you use in the query builder work the same as when you use them in the client libraries. You won’t have to second guess your filters anymore or try to type them out into the CLI. Depending on your schema you might regularly combine several filters: narrow down to the row prefix, select some columns and filter on timestamp. This is where I see a lot of value for database developers: receive instant feedback if your filter is returning what you expect and then simply translate it to your client of choice with our extensive examples on using filters with Bigtable.
Data pipeline validation
As someone who writes a lot of data pipelines, it can be frustrating to run a pipeline for hours and then realize the result is not what you expected. The tools in the studio allow for quick spot checking of data to catch issues faster.
For a pipeline that’s writing to Bigtable, I would start by running your pipeline on your local machine with a subset of your data. Run the code, then just hit refresh to see if the data in the expected format. In the past few weeks, I’ve been writing data to Bigtable with a Raspberry Pi (stay tuned!), and I found it valuable to debug using Bigtable Studio, instead of an additional command line. Also, I’m working on this project with a developer new to Bigtable, so I can easily link him to the data studio without needing to set him up with additional tools.
For 1:1 migrations like from HBase to Bigtable, I’d recommend using filters to spot check a few rows. Narrow down using the row key filters and check them against your source of truth to validate. Currently, there aren’t any aggregation functions, so after a spot check in the studio, create a hash over the before and after datasets for full validation.
All of this is available to use out now, so go to your Bigtable console and start exploring! Or if you’re looking for a comprehensive guide on setting up, features and limitations of Bigtable Studio, check out building queries in the console in the documentation.