Loading CSV Into Hbase Table In Kerberized Hadoop Cluster
Looking for some quick step by step method to load the bulk of data into Hbase table in a Kerberos enabled Hadoop cluster?, Well this space is perfect for you to know all the steps to load a CSV data file into “kerberized” Hadoop cluster.
To test HDFS encryption, recently I had the requirement to pump some data in one go into the Hbase table which I have documented for anyone looking for the same, so let’s get started without wasting much time.
Copy CSV Data To HDFS Filesystem
First of all, just download the CSV data file from some external or internal data source and copy this file into the HDFS filesystem from where Hbase can read the file to load the data into the Hbase table.
hdfs dfs -copyFromLocal emp_data.csv /user/hbase/opsuser/data
Create Hbase Table
First, log in with “hbase” user.
Connect With Hbase Service Principal(For Kerberos Authentication)
kinit -k -t /etc/security/keytabs/hbase.service.keytab hbase-kerbodemo@EXAMPLE.COM
Start Hbase-Shell
hbase shell
Create Hbase Table
create 'emp_data',{NAME => 'cf'}
Load data Into Hbase Table
For loading the CSV data into Hbase table we have to use the MR job which will load the data from CSV file into the Hbase table.
cd /usr/hdp/2.6.3.0-235/hbase/binhbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns='HBASE_ROW_KEY,cf:ename,cf:designation,cf:manager,cf:hire_date,cf:sal,cf:deptno' emp_data /user/hbase/opsuser/data/emp_data.csv
Scan the Hbase Table
scan 'emp_data'
Here you go, All data successfully loaded into Hbase Table.