Delta Tables And Hive Metastore — Reverse Engineering

2 min readDec 12, 2022

The following findings/methods are valid as of 12th December, 2022. Delta Lake is devloping project som new functions may be added or removed. Please refer to official documentation at delta.io

Creation of an External Delta Hive Table — New Way

Pre-requisite: s3location is already a delta table.

Once you created a delta table using spark (refer to this), you can notice a couple of things

SerDe Library:       org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:         org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat:        org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

Serde Library is LazySimpleSerde native hive supported and the input/output format is a Sequence file format.

Another important observation

Table Parameters:
 spark.sql.sources.provider delta

In the table parameters, one can find a reference to the delta.

We then extracted the create command of the table created via spark and added schema to the command and removed extra properties.

CREATE EXTERNAL TABLE IF NOT EXISTS <hiveDb>.<tableName> ( 
`col1` `col1Datatype` <description-comment>,
`col2` `col2Datatype` <description-comment>,
....
`colN` `colNDatatype` <description-comment> )

ROW FORMAT SERDE  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
   'path'='<actual s3 path>' )
STORED AS INPUTFORMAT  'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION  '<placeHolderLocation>'
TBLPROPERTIES (
   'spark.sql.sources.provider'='delta'
)

You can fire the above command to create an external table via hive or hive JDBC and register the table.

How does the structure look in HMS?

hive> desc formatted <hiveDb>.<tableName>;
OK
# col_name             data_type            comment
`col1` `col1Datatype` <description-comment>,
`col2` `col2Datatype` <description-comment>,
....
`colN` `colNDatatype` <description-comment>

# Detailed Table Information
Database:            uip_vep_delta
OwnerType:           USER
Owner:               hadoop
CreateTime:          Thu Nov 17 17:01:43 UTC 2022
LastAccessTime:      UNKNOWN
Retention:           0
Location:            <placeholderlocation>
Table Type:          EXTERNAL_TABLE
Table Parameters:
 EXTERNAL             TRUE
 bucketing_version    2
 numFiles             0
 spark.sql.sources.provider delta
 spark.sql.sources.schema.numParts 1
 spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\":[]}
 totalSize            0
 transient_lastDdlTime 1668746389

# Storage Information
SerDe Library:       org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:         org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat:        org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Compressed:          No
Num Buckets:         -1
Bucket Columns:      []
Sort Columns:        []
Storage Desc Params:
 path                 <actual location>
 serialization.format 1

Advantages:

No need to external jar in the spark session
No impact on querying speed via spark
Full schema is present in HMS
Schema evolution can happen using (Alter table command — native storage handler)

Disadvantages

Not queryable via hive

Conclusion

In the last part we have seen a couple of different ways of creating delta tables, one can determine which is best suited for them.

If you are not looking for a description of data in HMS you can go with creating delta tables via spark with no hassle of evolving schema.

If your use case is more about interacting with the hive shell on day to day basis and you won’t impact the speed of execution then creating a table with an external hive connector is best for you.

If your use case involves registering correct metadata in metastore like HMS and does not care much about querying aspects with respect to hive then you could use the approach mentioned in this article.

Delta Tables And Hive Metastore — Reverse Engineering

Creation of an External Delta Hive Table — New Way

Conclusion

Written by Bhavya Bordia