How to Regex in HBase Shell
If you’ve been living in a NoSQL area for years you could stop noticing the differences between databases and their concepts. A couple weeks ago I was asked to help with querying by a regular expression in the HBase shell. Suddenly I realised that HBase might be the most difficult database to deal with on a command line level. It differs from other database clients significantly. This is why my colleague suggested sharing this technique in the hope to help other people who have to deal with HBase without any preparation or proper training.
The task is to obtain from the table all Formula 1 pilots whose last names start with “Ham”. To make the explanation clear and shorter I use a similar SQL request mapped to HBase’s syntax.
Please, import all the following classes before you run the query. Just copy, paste in HBase shell and run.
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.BinaryComparator
import org.apache.hadoop.hbase.filter.RegexStringComparator
Then edit your “scan” request and execute it.
scan ‘f1Drivers’, {FILTER => SingleColumnValueFilter.new(Bytes.toBytes(‘name’), Bytes.toBytes(‘last_name’), CompareFilter::CompareOp.valueOf(‘EQUAL’), RegexStringComparator.new(“^HAM.*”)), LIMIT => 10}
If you have questions regarding #bigdata #hbase #datawrangling #datalake #dataengineering #datamodeling #spark #nosql #java #scala areas please let me know and I’ll try to create similar answer for you :)