Hbase at a glance

With a rise of data processing use cases comes varied challenges every data acquisition platforms face. Processing contiguous streams of data from varied sources is one of the important use cases every big data company solves.

Apache HBase is a technology that helps solve these use cases very efficiently. HBase is able to provide real-time read/write access to your data at a very massive scale. Coming from the prestigious No SQL family of databases HBase provides unstructured, schemaless, storage for all your data ingestion requirements.

Features

HBase at its core is nothing more than a key-value store on top of HDFS. HBase can be customized to stored data in any file storage system local or cloud. However, HBase is predominantly used with HDFS due to its inheritance and compliance with Hadoop technology. Here are the following features HBase that has great benefits for a master data store:

  • Dynamically add or remove columns
  • Ability to massively scale horizontally
  • Extremely efficient random read/write access
  • Compliant with Hadoop ecosystem

Challenges

Few challenges faced by companies using HBase as their master data storage is their lack of control over how the data is accessed. This is mostly due to improper schema design and lack of thorough knowledge on the access pattern.

Solutions

Here are the following guidelines KloudOne uses to ensure efficient read/write in HBase

$ Ensure limited column families (3 or 4 max)

Although one can use large no of column families it is generally accepted to limit the column families to about 3 or 4.

$ Column families with similar access pattern should be grouped

This will help during the data retrieval as column families with similar access pattern are easily scanned and retrieved in HBase.

$ Ensure generation of unique row key

At KloudOne we ensure that the row key designed is unique in all instance unless any update or delete has to happen over it. This can be easily achieved by compounding the row key with date or uuid as suffix

Eg: user_id | date

The unique row key design also ensures that the data is always sharded properly and not just stored on same region server.

At KloudOne

Kloudone is as a premier cloud solutions company with expertise in solving big data problems. As part of our solutions stack, we’ve been using Hbase quiet frequently for its consistency and partition tolerance. Following are few use cases where Kloudone has utilized Hbase -

  • To store reporting metrics like impression, clicks, views for one of our advertising customers. Hbase native support for counters was instrumental in this design.
  • Backbone to build a massive real-time Data Management Platform. Hbase unstructured, schemaless storage was vital to this design and approach.

Conclusion

As part of our upcoming series we are looking to drop technology content that are essential to our solution stack. Do let us know what you’d like us to talk about. For queries and doubts you can email me at fauzan@kloudone.com