OpenTSDB Metric HBase Region Finder

Kunal Nawale
Salesforce Engineering
3 min readJul 9, 2019

Kunal Nawale is a Software Architect at Salesforce who designs, architects, and builds Big Data Systems. Kunal did the design and architecture of this open source project. Colbert Guan did the implementation and testing.

Argus is a time series solution that powers the majority of Salesforce’s monitoring and alerting needs. Argus uses OpenTSDB and HBase as the backend store. In a previous blog post, we discussed the challenges of discovering metric names. In this blog post, we will discuss the challenges of locating a specific metric in HBase.

When you have billions of metrics stored on hundreds of HBase region servers, it becomes incredibly hard to determine which region server is serving your metric query. When your customers complain that some queries are experiencing a delay while the rest are working fine, it becomes even more critical to find the region and region server so that you can do some analysis on those region servers.

This blog post talks about how you can find those region servers. But before we get into how to find those region servers, let’s talk about the way OpenTSDB metrics are stored in HBase, in a single table. This table is sharded into regions based on the row keys. Each region has a start-key and a stop-key. There are several methods of sharding/splitting as explained here, controlled by the hbase config parameter hbase.regionserver.region.split.policy. While HBase has control over how to split the regions, OpenTSDB decides how the row keys are created. This influences the region splits. The algorithm that OpenTSDB uses to create the rowkeys is pretty robust and has a very good distribution of the key space. It uses the metric name and the creation time of the metric to create the rowkey. The use of epoch timestamp in the row key guarantees that new data points for a heavy traffic metric get added to new rows, in such a way that no single row gets overloaded. The format it uses is as follows:

The entire HBase table is split into regions; each region has a start rowkey and a stop rowkey. The information about each region is stored in HBase in the data structure called RegionInfo. The RegionInfo has the following format:

[TableName],[CreationEpoch],.[RegionNameMD5Digest].,[StartKey],[StopKey]

On startup, the OpenTSDB read daemons connect with HBase master and retrieve the Region Info map for each region server. This region info map is stored in a local cache and refreshed often. During query execution time, the query is mapped to the regions. The regions are mapped to the region servers. These region servers are then queried for the metric data.

So, let’s return to our original problem: how to determine region/region-server on demand without a painstaking log search. This was a question we asked ourselves repeatedly. After looking around, we could not find any such tool in the open source community that we could use. Therefore, we decided to build a tool that would help ease our problem and hopefully yours, too. We are very happy to announce that we have built this tool and are open sourcing it. TSDB HBase Region Finder is now available here. This tool includes both a web server and a command line version.

Here is an example of how to use the web server version:

The cli version operates similarly :

$ bin/cli envoy.server.uptime
RegionServer|RegionName
http://host1.hbase.com:60030/|tsdb,1510122330068.05714303d5f4fac661199d2cbb343
http://host1.hbase.com:60030/|tsdb,1510122330068.05714303d55bfac661199d2cbb343

If you would like us to add any features, or have any suggestions on the tool, please let us know via GitHub. (Or better yet, send us a Pull Request!)

The Salesforce Infrastructure organization has many such exciting problems to solve. Some of these problems present very difficult scale challenges that are unique and available at very few companies in the world. If such challenges excite you then please reach out to us via our careers page.

--

--