Reading json message from Kafka topic and process using Spark Structured Streaming and write it back to a file(hive)

Spark Structured Streaming example

2 min readAug 29, 2022

Below is the code that uses spark structured streaming to read data from a kafka topic and process and write the processed data as a file to a location that hive table refers.

To make it work on your laptop, you have to

have hadoop and spark installed on your laptop and running all the daemons
have hive installed and running
create the hive table
CREATE EXTERNAL TABLE IF NOT EXISTS restaurants(
res_id STRING,
name STRING,
cuisines STRING,
zipcode STRING,
rowid string,
rating_text STRING,
user_id STRING,
rating STRING )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE
LOCATION ‘/user/hive/restaurants’;
have installed Kafka and started the zookeeper, kafka broker, created the topic and started the producer
For Kafka single node installation and to starting the servers, you can refer
https://medium.com/@jaya.aiyappan/install-single-node-kafka-cluster-on-windows-and-test-it-65e24e07d0aa

3. send the following message on the producer terminal

4. start the spark history server (Optional)

Edit the %SPARK_HOME%/conf/spark-defaults.conf file to have

spark.eventLog.enabled true
spark.history.fs.logDirectory file:///C:/tmp/spark-events

run the following command from %SPARK_HOME%

bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer

5. run the program and you should get the

Reading json message from Kafka topic and process using Spark Structured Streaming and write it back to a file(hive)

Spark Structured Streaming example

Written by Jaya Aiyappan