Associate Apache Hive Client with HDFS

Rain Wu
Random Life Journal
2 min readApr 20, 2020

This note will expect that already have a HDFS cluster available to use, and focusing on hive client configuration.

Hive

Hive is a query tool in order to query files inside HDFS cluster efficiently, refer to official page for advanced information. Package could be obtained with the command:

$ wget http://apache.stu.edu.tw/hive/hive-2.3.7/apache-hive-2.3.7-bin.tar.gz
$ tar xzvf apache-hive-2.3.7-bin.tar.gz

Put the whole project where you want, and append the following contents to your /etc/profile.

export HIVE_HOME=/path/to/your/hive/project
export PATH=$HIVE_HOME/bin:$PATH

Remember to make it effective with instructions.

$ source /etc/profile

Configurations

Apache Hive has tons of configs, but I only studied the part I would use because I’m too lazy :). The main configuration file we need to provide is hive-site.xml, we can clone from the template file hive-default.xml.template within default project.

$ cp hive-default.xml.template hive-site.xml
$ vi hive-site.xml

Then comes the most troublesome thing, hive-site.xml configuration, find the following paragraph and update to the appropriate value.

The configuration file will be different depending on the purpose. Your value may be different from mine, but it is probably these that need to be set.

First part is for the postgresql database connection, Apache Hive can save its schema on RDB, so I choose the one I am most familiar with.

If you don’t have a Postgresql instance, you can use the Docker container to create one.

docker run \
--name hive_pg \
-p 5432:5432 \
-e POSTGRES_USER={YOUR_POSTGRES_USER} \
-e POSTGRES_PASSWORD={YOUR_POSTGRES_PASSWORD} \
-d \
postgres

Second part is for the working and temporary directory.

And the last one is for verification, if you expect to use other verification mechanisms, you can leave this alone.

After all configuration, initialize schema within database.

$ schematool -dbType postgres -initSchema

At this time, you should be able to start hive client, good luck!

$ hive

Troubleshooting

In fact, in addition to the very troublesome configuration file, it does not encounter unexpected problems too often. The only time I ran into it was Permission denied issue, caused by insufficient operating authority.

If you get stuck in it, you can refer to the solution below, it works for me.

--

--

Rain Wu
Random Life Journal

A software engineer specializing in distributed systems and cloud services, desire to realize various imaginations of future life through technology.