Associate Apache Hive Client with HDFS
This note will expect that already have a HDFS cluster available to use, and focusing on hive client configuration.
Hive
Hive is a query tool in order to query files inside HDFS cluster efficiently, refer to official page for advanced information. Package could be obtained with the command:
$ wget http://apache.stu.edu.tw/hive/hive-2.3.7/apache-hive-2.3.7-bin.tar.gz
$ tar xzvf apache-hive-2.3.7-bin.tar.gz
Put the whole project where you want, and append the following contents to your /etc/profile.
export HIVE_HOME=/path/to/your/hive/project
export PATH=$HIVE_HOME/bin:$PATH
Remember to make it effective with instructions.
$ source /etc/profile
Configurations
Apache Hive has tons of configs, but I only studied the part I would use because I’m too lazy :). The main configuration file we need to provide is hive-site.xml, we can clone from the template file hive-default.xml.template within default project.
$ cp hive-default.xml.template hive-site.xml
$ vi hive-site.xml
Then comes the most troublesome thing, hive-site.xml configuration, find the following paragraph and update to the appropriate value.
The configuration file will be different depending on the purpose. Your value may be different from mine, but it is probably these that need to be set.
First part is for the postgresql database connection, Apache Hive can save its schema on RDB, so I choose the one I am most familiar with.
If you don’t have a Postgresql instance, you can use the Docker container to create one.
docker run \
--name hive_pg \
-p 5432:5432 \
-e POSTGRES_USER={YOUR_POSTGRES_USER} \
-e POSTGRES_PASSWORD={YOUR_POSTGRES_PASSWORD} \
-d \
postgres
Second part is for the working and temporary directory.
And the last one is for verification, if you expect to use other verification mechanisms, you can leave this alone.
After all configuration, initialize schema within database.
$ schematool -dbType postgres -initSchema
At this time, you should be able to start hive client, good luck!
$ hive
Troubleshooting
In fact, in addition to the very troublesome configuration file, it does not encounter unexpected problems too often. The only time I ran into it was Permission denied issue, caused by insufficient operating authority.
If you get stuck in it, you can refer to the solution below, it works for me.