Using the Bigtable emulator with Apache Beam and BigtableIO

Israel Herraiz
May 27, 2020 · 3 min read
Using Beam + Bigtable in local, with no cloud resources

Cloud Bigtable is a great high performance distributed NoSQL database that can store petabytes of data, but sometimes you might need to test it for a tiny amount of data.

Spawning a remote instance just for that is cumbersome, you will be burning some money and cloud resources just for a small test.

Fortunately, Bigtable comes with an emulator that is very handy to do simple tests with tiny amounts of data, because you can run it in your local computer.

The emulator works in memory, exposing the same interface as Bigtable, so any system that is able to connect to a real Bigtable instance should be able to use the emulator too.

The documentation is actually quite clear about how to set the right environment variables to use the emulator, just start first the emulator and then execute this in the shell session where you want to use the emulator:

$(gcloud beta emulators bigtable env-init)

That will add an environment variable named BIGTABLE_EMULATOR_HOST with the details to connect to the emulator.

The emulator is an in-memory database that will be initially empty. You need to create tables, populate them, etc. To create tables, you can use the cbt utility. You need to configure it with some settings in the file ~/.cbtrc , including the project and the instance name.

But what project and instance name should you use for the emulator?

That's easy: any.

Whatever the project and instance values that you choose, if you set it in your Bigtable client, the emulator will access that table as if they were under that project and instance. So to create local tables, just configure your ~/.cbtrc file with values like the following:

project = fake-project
instance = fake-instance

And then create tables using the cbt utility.

If you want to access a table in the emulator from Apache Beam, using BigtableIO, you will have to use the same project and instance names as in your ~/.cbtrc file.

If you are using the DirectRunner in a shell with the BIGTABLE_EMULATOR_HOST variable defined, BigtableIO will try to connect to the emulator. For instance, the following code would attempt to read from a table of name mytable in the emulator:

If you want to write rather than read, it works in the same way.

If you are using any other client, just make sure that the environment variable BIGTABLE_EMULATOR_HOST is set, and use a project and instance names of your choice. All the Bigtable libraries check for that environment variable, and if it exists, they will use it to connect to the emulator instead of connecting to a remote instance.

Combining the Bigtable emulator and the DirectRunner of Apache Beam, you can run in local any pipeline that uses Bigtable without having to spawn an instance, without having to spend any cloud resources just for small tests.

Google Cloud - Community

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store