How to create DDL script for already exist tables in hive_metastore with help of spark.Catalog in PySpark?

Mostly we are creating tables in hive_metastore from pyspark dataframes with help of saveAsTable look like:

dataframe.write.mode(“append”).format(“delta”).option(“mergeSchema”, “true”).saveAsTable(“SchemaName.tableName”)

Suppose if we have more tables or have more columns in table and you want to Create the DDL script for those table to store in your code repository , then it would be boring task to manually write the DDL script.

In Spark 3.4.0, we have pyspark.sql.Catalog.listTables() method, we can use that one.

https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Catalog.getDatabase.html#pyspark.sql.Catalog.getDatabase

Code snippets for Spark 3.4.0:

%python

get_databse=spark.catalog.getDatabase({SchemaName/DbName})

tables = spark.catalog.listTables(get_databse.name)

for t in tables:

create_script= spark.sql(“SHOW CREATE TABLE {}.{}”.format(get_databse.name, t.name))

print(create_script.first()[0])

Code snippets for Spark 3.2.0:

%python

listdbs = spark.catalog.listDatabases()

for mydb in listdbs:

if mydb.name==”MyHiveMetadata.Schema”:

tables = spark.catalog.listTables(mydb.name)

for t in tables:

create_script= spark.sql(“SHOW CREATE TABLE {}.{}”.format(mydb.name, t.name))

print(create_script.first()[0])