Hive Installation on ubuntu 18.04 | MySQL Metastore

Adarsh Ms
Analytics Vidhya
Published in
5 min readApr 17, 2020

--

Apache Hive is a data warehouse infrastructure that facilitates querying and managing large data sets which resides in distributed storage system. It is developed on top of Hadoop. Hive has its own SQL-like query language called HiveQL (Hive Query Language).

Hive query language is similar to SQL wherein it supports subqueries. With Hive query language, it is possible to take a MapReduce joins across Hive tables.

Since, Hive is built on top of Hadoop, Java and Hadoop needs to be installed in your system. If Hadoop is not configured on your system, you can follow the step by step installation insctrutions in: https://medium.com/@adarshms1997/hadoop-single-node-cluster-setup-b11b957681f2?sk=6bcd2f534d70f8d7c34e86241f43f16e

Before installing the Hive, make sure your Hadoop installation is fine and all the core services of Hadoop are up and running.
Environment used in this setup is ubuntu 18.04 and hive version is 3.1.2.

Note: Prefer java 8, as newer versions no longer has URLClassLoader which is required for running hive.

Hive 3.x installation

Now. lets begin the installation process of hive by downloading the latest stables release from (http://apachemirror.wuchna.com/hive/hive-3.1.2/), for older versions visit (http://apachemirror.wuchna.com/hive/).

To download the release of your choice use the following commands. (Change the directory and download link according to your preference).

cd /usr/local
sudo wget
http://apachemirror.wuchna.com/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

Extract the the hive file in same location.

sudo tar xvzf apache-hive-3.1.2-bin.tar.gz

Rename the extracted folder

sudo mv apache-hive-3.1.2-bin hive

Adding Hive Environment Variables

Adding hive path to the environment is necessary, else you would have to move to the hive directory to run commands.

Open the bashrc file by running

sudo nano ~/.bashrc

Add the following lines to the end of the bashrc file

# Set HIVE_HOME 
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin

Now, load the hive environment variables by running the following commands

source ~/.bashrc

Creating directories …
Now we need to create Hive directories within HDFS.

hdfs dfs -mkdir /bigdata/tmp

Now for the hive to save table or other miscellaneous data we need to create another directory.

hdfs dfs -mkdir -p /bigdata/hive/warehouse

Adding Permissions …

hdfs dfs -chmod g+w /bigdata/tmp
hdfs dfs -chmod g+w /bigdata/hive/warehouse

Configuring Hive …

Change the working directory to hive configurations location

cd /usr/local/hive/conf
  • hive-env.sh
    Open the hive-env file by running the following command
sudo nano hive-env.sh

Add the following configurations to the end of the file (Change the paths according to your setup)

# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/usr/local/hadoop
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/usr/local/hive/conf
# Java Home
export JAVA_HOME=/usr

Metastore Configuration

Metastore is the central repository of Apache Hive metadata. It stores metadata for Hive tables (like their schema and location) and partitions in a relational database.

All Hive implementations need a metastore service, where it stores metadata. By default, Hive uses a built-in Derby SQL server. You can also choose MySQL, Postgres, Oracle, MS SQL Server as Hive Metastore.

We will be using MySQL for this configuration. Metastore configuration needs to be specified in the hive-site.xml file.

First things first, lets install the latest mysql version using aptitude. You can skip this step if the system has a mysql installation .

sudo apt-get update
sudo apt-get install mysql-server

If the secure installation utility does not launch automatically after the installation completes, enter the following command:

sudo mysql_secure_installation utility

This utility prompts you to define the mysql root password and other security-related options, including removing remote access to the root user and setting the root password.

sudo systemctl start mysql

This command starts the mysql service

sudo systemctl enable mysql

This command ensures that the database server launches after a reboot

Now after successful installation of mysql server, we need to install the mysql java connector. Run the following command to install the connector.

sudo apt-get install libmysql-java

For the hive to access the mysql connector, a soft link needs to be created for the connector in hive lib folder or the jar file should be copied to the hive lib folder.

sudo ln -s /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar

Create the Initial database schema using the hive-schema-3.1.0.mysql.sql file ( or the file corresponding to your installed version of Hive) located in the $HIVE_HOME/scripts/metastore/upgrade/mysql directory.

Login to mysql shell

mysql -u root -p

Create db for metastore

CREATE DATABASE metastore;USE metastore;SOURCE /usr/local/hive/scripts/metastore/upgrade/mysql/hive-schema-3.1.0.mysql.sql;

For Hive to access the metastore a MySQL user account needs to be created. It is very important to prevent this user account from creating or altering tables in the metastore database schema. (Don’t forget the quotes)

CREATE USER 'hiveuser'@'%' IDENTIFIED BY 'hivepassword';GRANT all on *.* to 'hiveuser'@localhost identified by 'hivepassword';flush privileges;

Now we’ve created the metastore and hive user in mysql. Lets define the metastore configuration in hive-site.xml.

  • hive-site.xml
    Open the hive-site file by running the following command
cd /usr/local/hive/conf
sudo nano hive-site.xml

Add the following configurations

<configuration> 
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>hivepassword for connecting to mysql server</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/bigdata/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
<description>Thrift URI for the remote metastore.</description>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
</configuration>

All set… Now lets dive into the hive console. Type in the following command in your terminal and hit enter

hive

If you see any errors related to jdbc driver not found, check whether you have successfully linked or copied the driver to hive lib folder.

Starting Hive metastore…

Now let’s start our metastore service, run the following command in your terminal,

hive --service metastore

If you encounter any errors, please verify your metastore configuration.
Check your hive-site.xml config and check metastore db and user configured in mysql. For errors related to thrift, check the availability of the port used in your hive.metastore.uris property.

Note: If you want to run hive metastore as a system service, then do the following steps.

First, lets create a service file for starting our hive metastore

sudo nano /etc/systemd/system/hive-meta.service

Add the following lines to the service file

[Unit] 
Description=Hive metastore
After=network.target

[Service]
User=ubuntu
Group=www-data
ExecStart=/usr/local/hive/bin/hive --service metastore

[Install]
WantedBy=multi-user.target

Now lets start our hive-meta service by running the following command

sudo systemctl start hive-meta

If you need to start hive metastore at system boot, then create a symlink using the following command

sudo systemctl enable hive-meta

Extras …

If you were able to get into the hive console without any errors, you can verify your metastore configuration by following these steps.

Create a table in hive.

create table test(id int, name string);

Now exit from the hive console (Type exit and hit enter)

Let’s see if the table was successfully added in our metastore. Login to the mysql console using hive user credentials.

mysql -u root -p

Change the database and view tables.

use metastore;select * from TBLS;

If you see the test table listed in TBLS table, then your installation was successful. If not, check your mysql database and user configuration for hive metastore.

--

--

Adarsh Ms
Analytics Vidhya

A passionate engineer who thrives on using programming languages to converse with machines.