How to Regex in HBase Shell

Oleksandr Tarasenko
1 min readOct 9, 2022

--

If you’ve been living in a NoSQL area for years you could stop noticing the differences between databases and their concepts. A couple weeks ago I was asked to help with querying by a regular expression in the HBase shell. Suddenly I realised that HBase might be the most difficult database to deal with on a command line level. It differs from other database clients significantly. This is why my colleague suggested sharing this technique in the hope to help other people who have to deal with HBase without any preparation or proper training.

The task is to obtain from the table all Formula 1 pilots whose last names start with “Ham”. To make the explanation clear and shorter I use a similar SQL request mapped to HBase’s syntax.

Mapping SQL pattern matching to HBase command line’s equivalent
Mapping SQL pattern matching to HBase command line equivalent

Please, import all the following classes before you run the query. Just copy, paste in HBase shell and run.

import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.BinaryComparator
import org.apache.hadoop.hbase.filter.RegexStringComparator

Then edit your “scan” request and execute it.

scan ‘f1Drivers’, {FILTER => SingleColumnValueFilter.new(Bytes.toBytes(‘name’), Bytes.toBytes(‘last_name’), CompareFilter::CompareOp.valueOf(‘EQUAL’), RegexStringComparator.new(“^HAM.*”)), LIMIT => 10}

If you have questions regarding #bigdata #hbase #datawrangling #datalake #dataengineering #datamodeling #spark #nosql #java #scala areas please let me know and I’ll try to create similar answer for you :)

--

--