Custom JDBC dialect for Hive

Liang-Chi Hsieh
Sep 27, 2017 · 2 min read

Recently I’ve answered a JIRA ticket SPARK-22113. It is reported an issue when using JDBC to connect Hive in SparkSQL.

Usually we will use a session with hive support to connect with Hive:

val session = SparkSession.builder().enableHiveSupport().getOrCreate()
session.sql("select dw_date from tfdw.dwd_dim_date limit 10").show()

With JDBC, in theory you are able to connect Hive as like:

val ds = session.read()
.format("jdbc")
.option("driver", "org.apache.hive.jdbc.HiveDriver")
.option("url", "jdbc:hive2://iZ11syxr6afZ:21050/;auth=noSasl")
.option("dbtable", "tfdw.dwd_dim_date")
.load()
ds.select("dw_date").limit(10).show()

However, the returned results are not correct and looks like you select a string literal.

After few tests, I found that default JDBC dialect in SparkSQL doesn’t work with Hive at all. The default way in the default dialect to quote identifiers are using double quotes. A SQL query like SELECT “dw_date” FROM table… will be parsed by Hive to select a string literal, instead of a column named “dw_date”. By replacing quotes with backticks, seems the issue is resolved. However, in my test, the column names get from Hive are all prefixed with the table name like table.dw_date. But you can’t directly wrap backticks around it like `table.dw_date`. Alternatively, we need to wrap each part individually:

private case object HiveDialect extends JdbcDialect {  override def canHandle(url : String): Boolean = url.startsWith("jdbc:hive2")  override def quoteIdentifier(colName: String): String = {
colName.split(‘.’).map(part => s”`$part`”).mkString(“.”)
}
}

After registering this Hive JDBC dialect with SparkSQL, you can now connect with Hive via JDBC.

Because JDBC seems not a recommended way to use Hive under SparkSQL, currently this patch isn’t submitted as a PR. But this can be a reference for adding custom JDBC dialect into SparkSQL.

Written by

Spark, Big-Data, Machine Learning, Deep Learning

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade