Using Bigtable’s monitoring tools, meant for a petabyte-scale database, to… make art

Billy Jacobson
May 5, 2020 · 7 min read

For two years, I’ve been working on developer relations for Google’s Cloud Bigtable. It’s a database meant to handle petabytes of data and powers many core Google services, including Search, Analytics, Maps, and Gmail. However, the largest table I’ve created was around 100MB — not even close to what Bigtable can support.

The Bigtable team recently launched an improved version of their monitoring tool Key Visualizer, and given some recently acquired extra free time, now seemed like a great time to try loading in a ton of data and using the updated tool.

In the end, I wrote 10TB of data and discovered that I could reverse-engineer the key visualizer to create works of art.

What is Bigtable

If you’re trying to get a mental model of Bigtable, there are rows and columns, and each row/column intersection is a cell. Cells can have multiple values in them stored as versions, so a Bigtable table is a 3 dimensional table on row, column and version. Bigtable is a fairly low-level database, so it can provide great QPS and scalability, but it gives very basic querying capabilities that focus on the rowkey. You can get a single row or scan a range of rows by rowkey.

An example database about mobile devices. The first cell has multiple time versions with timestamps for the connected_wifi value.

There are many potential arrangements for how you organize your data, but basically, you don’t want to query the same rowkey, or range of rowkeys too frequently which can cause performance problems. That’s where the key visualizer tool comes in. It allows you to see which rowkeys or groups of rowkeys are being queried too frequently.

I saw the key visualizer could produce detailed images like the one below based on read throughput, so I wondered could I reverse the process and come up with a set of reads that would produce a specific image?

Loading 10TB of data into Bigtable (don’t try this at home)

All of the code used and instructions on how to run it are available on Github, so I will give a high level overview of what the code does, but won’t go into too many of the details in this post.

It’s very easy to create and scale Bigtable instances through the Cloud Console. For 10TB, I can use 8 nodes which gives me more than enough storage and throughput to quickly load in my data.

Creating a Bigtable instance through the user interface.

You may notice on the sidebar that this is fairly expensive, so don’t try this at home! For a cheaper alternative to try at home, you can create a 1 node instance which you should shut off once you’re done.

Once my instance is created, I use Dataflow to write 100MB per row of random bytes to my table. 10TB total and 100MB per row means we’ll write 100,000 rows. To avoid any issues with sequential writes, I use a rowkey that reverses the iteration number and pads it with zeroes.

Running the Dataflow job to load in data
I monitored the Dataflow job as it ramped up (left) and saw a similar increase in activity in the Bigtable monitoring tools (right)

The reversed rowkey helped out, but I ended up adding a few more nodes to stay under the max CPU utilization. I reduced my nodes once the data was loaded.

Creating queries that will activate pixels in KeyViz

The visualization produced by the load data job.

I discovered that if I continuously do range scans on certain cells, they will activate in the key visualizer. I wrote a Dataflow job that could perform range scans based on an inputted CSV. I started out with a simple drawing of a smiley face.

This let me know I’d be able to draw various images once I had them written in that format, but I wondered if I could add more depth to the image and use gradients by scanning certain areas more frequently than others. Key visualizer provides outputs in 15 minute windows. If my job has the input .7 for a range, it would have a 70% chance to scan that range. With hundreds of scans in each window, I hoped the usage would respond accordingly. I tried a scan with the CSV below and was happy to see I would be able to include depth in my images.

Once I knew the capabilities of the key visualizer, I did a bit of math and scripting to take any image and convert it into the CSV needed. I wrote a handy CodePen to do this, so I could have easy access to add more images. I also added an input for the amount of hours which determines the quality of the image produced .

Once I have that image created and uploaded to a public bucket, I run a pipeline that does the following:

  1. Download the image CSV
  2. Divide all the rowkeys evenly based on the dimensions of the CSV
  3. Create a scan based on several rowkey ranges – In order to get different intensities/depths in the visualizer, use the pixel values in the CSV to conditionally use the specific ranges
  4. Scan the table
  5. Repeat this every second, moving on to the next column every 15 minutes (the minimum width of each monitoring update)

You can also view and run the ReadData pipeline from Github.

The finale!

I set up a few tables and jobs to run at the same time, so I could get more results, and here they are:

There are several parameters you can play with. Brightness changes the scaling of the image, which is helpful if you want to take an in-depth look at a smaller area.

You can also adjust which metric is displayed. “Read bytes client” seems to produce smooth images while “Ops” produces images with more lines which can look really cool on some images.

And finally, if you are as big a fan of drag and RuPaul’s drag race as me, then you’ll understand why I had to immortalize the queen of drag in the key visualizer as well.

Google Cloud - Community

Google Cloud community articles and blogs

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Billy Jacobson

Written by

Unapologetically Myself | @googlecloud DevRel | New Yorker (he/him)

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store