Reading json message from Kafka topic and process using Spark Structured Streaming and write it back to a file(hive)
Spark Structured Streaming example
2 min readAug 29, 2022
Below is the code that uses spark structured streaming to read data from a kafka topic and process and write the processed data as a file to a location that hive table refers.
To make it work on your laptop, you have to
- have hadoop and spark installed on your laptop and running all the daemons
- have hive installed and running
- create the hive table
CREATE EXTERNAL TABLE IF NOT EXISTS restaurants(
res_id STRING,
name STRING,
cuisines STRING,
zipcode STRING,
rowid string,
rating_text STRING,
user_id STRING,
rating STRING )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE
LOCATION ‘/user/hive/restaurants’; - have installed Kafka and started the zookeeper, kafka broker, created the topic and started the producer
For Kafka single node installation and to starting the servers, you can refer
https://medium.com/@jaya.aiyappan/install-single-node-kafka-cluster-on-windows-and-test-it-65e24e07d0aa
3. send the following message on the producer terminal
4. start the spark history server (Optional)
- Edit the %SPARK_HOME%/conf/spark-defaults.conf file to have
spark.eventLog.enabled true
spark.history.fs.logDirectory file:///C:/tmp/spark-events
- run the following command from %SPARK_HOME%
bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer
5. run the program and you should get the