TIP- Automate — Fetch HDFS Location for Hive Table

Amit
Amit
Nov 8 · 2 min read

Hi folks!

This article is focused for newbies who are taking their small steps towards the big Data world. Hive is a data warehouse system that provides a SQL querying interface for data stored in distributed file system. There are two types of tables in hive, viz. Managed and External.

Hive Managed tables, also known as internal tables and are managed by Hive. The data files are present hive database directory. Further, if we execute DROP or DELETE on these tables, the data lying inside these tables is also flushed out

Hive External tables are the ones which refer data from hdfs location as mentioned by the end user. Note that dropping of tables will not flush hdfs location for these tables.

In order to fetch all hdfs locations for hive tables follow the below mentioned process.

Login to hive metastore :

Use the below query:

SELECT DBS.NAME DB_NAME, TBL_NAME, TBL_TYPE, LOCATION FROM TBLS INNER JOIN DBS ON TBLS.DB_ID=DBS.DB_ID LEFT OUTER JOIN SDS ON TBLS.SD_ID =SDS.SD_ID

In order to export hive metadata, you may also use the shell file

#!/bin/sh

echo “use hive;SELECT DBS.NAME DB_NAME ,’|’, TBL_NAME,’|’, TBL_TYPE,’|’, LOCATION FROM TBLS INNER JOIN DBS ON TBLS.DB_ID=DBS.DB_ID LEFT OUTER JOIN SDS ON TBLS.SD_ID =SDS.SD_ID;” | mysql -D hive -u root > hivelocations.txt

Kindly share more insights in this regard.. until then keep learning.. keep sharing!!

Amit
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade