Migrate views from hive metastore to Unity Catalog

Viral Patel
3 min readJan 21, 2024

In this blog, I will walk you through how you can migrate existing views from hive metastore to Unity Catalog. we will use custom python function in order to migrate existing views from hive metastore or one database to another database.

Migrating views from Hive Metastore to Unity Catalog is a process of copying or recreating the definitions of the Hive Metastore views in the Unity Catalog database. This is a one-time operation that can be done using Databricks SQL CREATE statements followed by a software that transfers data, metadata, and schema definitions between systems. Once the views have been migrated, they can be accessed from Unity Catalog and benefit from improved features, such as ACID transactions, Delta caching mechanism, and other data management capabilities.

Pre-requisites

  1. Databricks Workspace and Unity Catalog enabled cluster
  2. All Privileges to the UC
  3. Knowledge of Views.
    https://learn.microsoft.com/en-gb/azure/databricks/data-governance/unity-catalog/create-views

Step 1. Register Custom Function

Function Name: migrate_views_from_hive_metastore_to_unity_catalog(view_source_ct_name: str, view_source_db_name: str, view_destination_ct_name: str, view_destination_db_name: str)

Description

The migrate_views_from_hive_metastore_unity_catalog function is a Python function that migrates or copies all views from a source Hive metastore to a destination Unity Catalog. This function is useful when upgrading or switching to the Unity Catalog, while retaining the old database schema.

Parameters

The function takes the following four string parameters:

view_source_ct_name : The name of the existing Catalog where the views are registered.

view_source_db_name : The name of the database containing views in the existing Catalog.

view_destination_ct_name : The name of the destination Catalog where views will be copied.

view_destination_db_name : The name of the destination database in the Catalog where views will be copied.


import concurrent.futures

def migrate_views_from_hive_metastore_to_unity_catalog(view_source_ct_name,view_source_db_name,view_destination_ct_name,view_destination_db_name):

"""
Function Name: migrate_views_from_hive_metastore_unity_catalog(view_source_ct_name: str, view_source_db_name: str, view_destination_ct_name: str, view_destination_db_name: str)

Description
The migrate_views_from_hive_metastore_to_unity_catalog function is a Python function that migrates or copies all views from a source Hive metastore to a destination Unity Catalog. This function is useful when upgrading or switching to the Unity Catalog, while retaining the old database schema.

Parameters
The function takes the following four string parameters:

view_source_ct_name - The name of the existing catalog where the views are registered.
view_source_db_name - The name of the database containing views in the existing catalog.
view_destination_ct_name - The name of the destination catalog where views will be copied.
view_destination_db_name - The name of the destination database in the catalog where views will be copied.

For Example 1

view_source_ct_name = 'source_catalog'
view_source_db_name = 'source_db'
view_destination_ct_name = 'destination_catalog'
view_destination_db_name = 'destination_db'

migrate_views_from_hive_metastore_unity_catalog(view_source_ct_name, view_source_db_name, view_destination_ct_name, view_destination_db_name)

Example 2
view_source_ct_name = 'hive_metastore'
view_source_db_name = 'gold_vv'
view_destination_ct_name = 'uc_te'
view_destination_db_name = 'gold_vv'

migrate_views_from_hive_metastore_unity_catalog(view_source_ct_name,view_source_db_name,view_destination_ct_name,view_destination_db_name)
"""

tables = spark.sql(f"SHOW TABLES IN {view_source_ct_name}.{view_source_db_name}").collect()

print(tables)

display(spark.sql(f"CREATE DATABASE IF NOT EXISTS {view_destination_ct_name}.{view_destination_db_name}"))
display(spark.sql(f"USE CATALOG {view_destination_ct_name}"))

def process_views(t):
# add code to process each database here
index = tables.index(t) + 1
print(f"Creating View No: {index}")
print("")
query = f"SHOW CREATE TABLE {view_source_ct_name}.{t.database}.{t.tableName}"
print(query)

df = spark.sql(query)

create_table_sql = df.select('createtab_stmt').collect()[0]['createtab_stmt']

query = f"DROP VIEW IF EXISTS {view_destination_ct_name}.{view_destination_db_name}.{t.tableName}"
display(spark.sql(query))

create_table_sql = create_table_sql.replace(f"{view_source_ct_name}",f"{view_destination_ct_name}")

print(create_table_sql)
display(spark.sql(create_table_sql))
print("----------------------------------------------------------------------------------------------------------------------------")
print("")

with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [executor.submit(process_views, db) for db in tables]
# wait for all futures to finish
for future in concurrent.futures.as_completed(futures):
try:
# get the result of each future
result = future.result()
except Exception as e:
# handle any exceptions that occurred while processing the future
pass

Usage: Ensure that you have the appropriate access rights for both the source and destination Catalog to execute the required operations.

Call the migrate_views_from_hive_metastore_unity_catalog function with the required parameters.

The function will retrieve information about all views from the source Catalog and copy each view from the source to the destination Catalog by creating the same view with the same definition in the destination Catalog.

Note: The function does not copy the actual view data stored on the filesystem; therefore, you must copy the data manually using ETL pipelines or another suitable tool if applicable

# 01. Migrate hiive_metastore.gold_vv to gold_te.gold_vv

view_source_ct_name = 'hive_metastore'
view_source_db_name = 'gold_vv'
view_destination_ct_name = 'gold_te'
view_destination_db_name = 'gold_vv'

migrate_views_from_hive_metastore_to_unity_catalog(view_source_ct_name,view_source_db_name,view_destination_ct_name,view_destination_db_name)

In Conclusion, we have migrated all views from hive metastore to unity Catalog using custom defined python function. This function is not restricted to unity Catalog, You can migrate views from one database to another database.

Thanks

--

--

Viral Patel

Azure Morden Data Platform, Senior Solution Architect, Databricks Champion