How to create DDL script for already exist tables in hive_metastore with help of spark.Catalog in PySpark?
Mostly we are creating tables in hive_metastore from pyspark dataframes with help of saveAsTable look like:
dataframe.write.mode(“append”).format(“delta”).option(“mergeSchema”, “true”).saveAsTable(“SchemaName.tableName”)
Suppose if we have more tables or have more columns in table and you want to Create the DDL script for those table to store in your code repository , then it would be boring task to manually write the DDL script.
In Spark 3.4.0, we have pyspark.sql.Catalog.listTables() method, we can use that one.
Code snippets for Spark 3.4.0:
%python
get_databse=spark.catalog.getDatabase({SchemaName/DbName})
tables = spark.catalog.listTables(get_databse.name)
for t in tables:
create_script= spark.sql(“SHOW CREATE TABLE {}.{}”.format(get_databse.name, t.name))
print(create_script.first()[0])
Code snippets for Spark 3.2.0:
%python
listdbs = spark.catalog.listDatabases()
for mydb in listdbs:
if mydb.name==”MyHiveMetadata.Schema”:
tables = spark.catalog.listTables(mydb.name)
for t in tables:
create_script= spark.sql(“SHOW CREATE TABLE {}.{}”.format(mydb.name, t.name))
print(create_script.first()[0])