Using the Bigtable emulator with Apache Beam and BigtableIO

Israel Herraiz
Google Cloud - Community
3 min readMay 27, 2020
Using Beam + Bigtable in local, with no cloud resources

Cloud Bigtable is a great high performance distributed NoSQL database that can store petabytes of data, but sometimes you might need to test it for a tiny amount of data.

Spawning a remote instance just for that is cumbersome, you will be burning some money and cloud resources just for a small test.

Fortunately, Bigtable comes with an emulator that is very handy to do simple tests with tiny amounts of data, because you can run it in your local computer.

The emulator works in memory, exposing the same interface as Bigtable, so any system that is able to connect to a real Bigtable instance should be able to use the emulator too.

The documentation is actually quite clear about how to set the right environment variables to use the emulator, just start first the emulator and then execute this in the shell session where you want to use the emulator:

$(gcloud beta emulators bigtable env-init)

That will add an environment variable named BIGTABLE_EMULATOR_HOST with the details to connect to the emulator.

The emulator is an in-memory database that will be initially empty. You need to create tables, populate them, etc. To create tables, you can use the cbt utility. You need to configure it with some settings in the file ~/.cbtrc , including the project and the instance name.

But what project and instance name should you use for the emulator?

That's easy: any.

Whatever the project and instance values that you choose, if you set it in your Bigtable client, the emulator will access that table as if they were under that project and instance. So to create local tables, just configure your ~/.cbtrc file with values like the following:

project = fake-project
instance = fake-instance

And then create tables using the cbt utility.

If you want to access a table in the emulator from Apache Beam, using BigtableIO, you will have to use the same project and instance names as in your ~/.cbtrc file.

If you are using the DirectRunner in a shell with the BIGTABLE_EMULATOR_HOST variable defined, BigtableIO will try to connect to the emulator. For instance, the following code would attempt to read from a table of name mytable in the emulator:

If you want to write rather than read, it works in the same way.

If you are using any other client, just make sure that the environment variable BIGTABLE_EMULATOR_HOST is set, and use a project and instance names of your choice. All the Bigtable libraries check for that environment variable, and if it exists, they will use it to connect to the emulator instead of connecting to a remote instance.

Combining the Bigtable emulator and the DirectRunner of Apache Beam, you can run in local any pipeline that uses Bigtable without having to spawn an instance, without having to spend any cloud resources just for small tests.

--

--