How to Read Data from Hive & Write to MS SQL Table Using Spark-Shell

Published in

Zaloni Engineering

2 min readJan 13, 2022

Blog Overview:

This blog walks readers through the proper use case pre-requisites and steps to read data from Hive and write to MS SQL Server Table use case.

Pre-requisites for Use Case:

scala
spark-shell
beeline
hive 3.1.0
spark 2.3.2
mssql-cli v: 1.0.0 ( Refer https://github.com/dbcli/mssql-cli/blob/master/doc/installation/linux.md )
Driver sqljdbc4–2.0.jar

Use Case Steps:

Create Hive Table using Beeline

create database organization_db;
create table employee(id int,name string) 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

2. Create a CSV file & copy it to the HDFS path

[~]$ cat emp.csv
1,rohan
2,amar
3,jeet
[~]$ hadoop fs -put emp.csv hdfs://hdp314/warehouse/tablespace/managed/hive/organization_db.db/employee
[~]$ hadoop fs -cat hdfs://hdp314/warehouse/tablespace/managed/hive/organization_db.db/employee/emp.csv
1,rohan
2,amar
3,jeet

3. Create MS SQL Table

mssql-cli -U <username> -P <password> -S <sql server ip/hostname> -d <db_name>
db_name> create database orgdb;Time: 1.005s (a second)Commands completed successfully.db_name> use orgdb;Time: 0.502sCommands completed successfully.orgdb;> create schema org;Time: 0.453sCommands completed successfully.orgdb;> create table org.employee(id int primary key , name varchar(1024));Time: 0.453sCommands completed successfully.

4. Scala Code

val hiveDf = spark.sql("select * from organization_db.employee")
hiveDf.show()
val df = hiveDf.write.format("jdbc").mode(org.apache.spark.sql.SaveMode.Overwrite)
df.option("url","jdbc:sqlserver://<sql-server-ip>:1433;databaseName=orgdb").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable", "org.employee").option("user", "sa").option("password","xxxxx").save()

5. Invoke spark-shell & paste the above code

spark-shell --master yarn --jars /<path-to-driver>/sqljdbc4-2.0.jar

6. Sample output of the spark-shell

[~]$ spark-shell --master yarn --jars <driver-path>/sqljdbc4-2.0.jarSetting default log level to "WARN".To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).Spark context Web UI available at http://xxxxx:4041Spark context available as 'sc' (master = yarn, app id = application_1630517835182_0012).
Spark session available as 'spark'.
Welcome to
____              __/ __/__  ___ _____/ /___\ \/ _ \/ _ `/ __/  '_//___/ .__/\_,_/_/ /_/\_\   version 2.3.2.3.1.4.0-315/_/Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_271)scala> val hiveDf = spark.sql("select * from organization_db.employee")hiveDf: org.apache.spark.sql.DataFrame = [id: int, name: string]scala> hiveDf.show()+---+-----+| id| name|+---+-----+|  1|rohan||  2| amar||  3| jeet|+---+-----+scala> val df = hiveDf.write.format("jdbc").mode(org.apache.spark.sql.SaveMode.Overwrite)df: org.apache.spark.sql.DataFrameWriter[org.apache.spark.sql.Row] = org.apache.spark.sql.DataFrameWriter@21838ea3scala> df.option("url","jdbc:sqlserver://xxxxx:1433;databaseName=orgdb").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable", "org.employee").option("user", "sa").option("password","xxxx").save()

7. Validate the data in MS SQL Table

[amar@crrt-edge ~]$ mssql-cli -U <user_name> -P <password> -S <sql server ip/hostname>-d orgdborgdb> select * from org."employee";Time: 4.674s (4 seconds)+------+--------+| id   | name   ||------+--------|| 3    | jeet   || 1    | rohan  || 2    | amar   |+------+--------+(3 rows affected)

How to Read Data from Hive & Write to MS SQL Table Using Spark-Shell

Blog Overview:

Pre-requisites for Use Case:

Use Case Steps:

Written by Amar Gurung