Wait! What? There’s a new Java client for Bigtable?!

Daniel Bergqvist
Google Cloud - Community
3 min readOct 21, 2019

This post was created in collaboration with Kristen O’Leary.

For years the Java client we provided for Bigtable was the HBase compatible one. However, lately we have seen a lot of developers adopting Bigtable that aren’t making the transition from HBase, but rather Cassandra or something else.

So for the ones that don’t need HBase compatibility, using the HBase client can be overly complex for some operations. That’s the reason for developing a new Java client for Bigtable. Please note that this doesn’t mean that the HBase client will be deprecated. These two clients will live side by side and you can chose the one that best fits your use case.

The main features of the new client are:
- More flexible filters
- Less dependencies
- Fully asynchronous and back pressure aware

A blog post dissecting the features of the new client is in the making, stay tuned for that. In this one we will focus on the developer.

In order to be more hands on we decided that we would take the codelab: Introduction to Cloud Bigtable, created by our friend and colleague Billy Jacobson, which uses the HBase client and translate the queries so that they work with the new client.

The codelab uses a public dataset that can be found over at Kaggle.

This dataset is from the NYC MTA buses data stream service. In roughly 10 minute increments the bus location, route, bus stop and more is included in each row. The scheduled arrival time from the bus schedule is also included, to give an indication of where the bus should be (how much behind schedule, or on time, or even ahead of schedule).

Below you see the query for getting the data for a bus on the M86-SBS line on June 1, 2017 from 12:00 am to 1:00 am. A vehicle with id NYCT_5824 is on the bus line then.

Perform a lookup

The first query you’ll perform is a simple row lookup. You’ll get the data for a bus on the M86-SBS line on June 1, 2017 from 12:00 am to 1:00 am. A vehicle with id NYCT_5824 is on the bus line then.

With that information, and knowing the schema design (Bus company/Bus line/Timestamp rounded down to the hour/Vehicle ID,) you can deduce that the row key is:

MTA/M86-SBS/1496275200000/NYCT_5824

The result should contain the most recent location of the bus within that hour. But you want to see all the locations, so set the maximum number of versions on the get request.

So let’s see how this would look with the new Java client

Perform a scan
Now, let’s view all the data for the bus line for that hour. The scan code looks pretty similar to the get code. You give the scanner a starting position and then indicate you only want rows for the M86-SBS bus line within the hour denoted by the timestamp 1496275200000.

And here is the same query with the new Java client.

An interesting modification to this query is to view the entire month of data for the M86-SBS bus line, and this is very easy to do. Remove the timestamp from the start row and prefix filter to get the result.

The version for the new Java client would then look like this.

Introduce Filters
Next, you will filter on buses heading east and buses heading west

And here’s *drum roll* the new Java client version of the query.

To get the buses going west, change the string in the valueFilter:

And here’s how to change the filter for the new Java client.

Perform a multi-range scan
For the final query, you’ll address the case when you care about many bus lines in an area:

Lastly, the multi-range scan for the new client

All the code in the blog post can be found in this GitHub repo. If something doesn’t work as expected, please reach out to me.

--

--