Find all the tables by column name in HIVE

Rajnish Kumar Garg
2 min readApr 6, 2018

Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage.

Hive maintains all the table metadata in metastore db

It is a relational database repository that contains metadata about objects we create in hive or externally point to. It acts as a central repository that has data about your data. To process this data, we could either use HIVE’s own execution engines (MR, Tez, Spark, etc) or other processing tools like Spark, Presto etc.

Some times, we have the requirement to find all the tables that have some specific column name(s) i.e we want to do reverse lookup here.

For example, we want to find all the DB.TABLE_NAME where we have a column named “country”. By running the following command on HIVE metastore db, we can easily find all the database and tables.

SELECT c.column_name, 
t.tbl_name,
d.name AS DB_NAME
FROM SDS s
LEFT JOIN TBLS t
ON s.sd_id = t.sd_id
LEFT JOIN COLUMNS_V2 c
ON s.cd_id = c.cd_id
LEFT JOIN DBS d
ON t.db_id = d.db_id
WHERE t.tbl_name IS NOT NULL
AND d.name IS NOT NULL
AND c.column_name = 'country'
ORDER BY d.name,
t.tbl_name

Command line results:

(Optional)

If we have Superset or any other BI tool, we can easily connect HIVE meta store and build a nice search dashboard.

Example: Superset dashboard

Superset doesn’t support join, so we can create a view and then import the view.

Note: Hive metastore db is a critical component, so please make sure you are not running unnecessary heavy queries.

--

--