Reading json message from Kafka topic and process using Spark Structured Streaming and write it back to a file(hive)

Spark Structured Streaming example

Jaya Aiyappan
2 min readAug 29, 2022

Below is the code that uses spark structured streaming to read data from a kafka topic and process and write the processed data as a file to a location that hive table refers.

To make it work on your laptop, you have to

  1. have hadoop and spark installed on your laptop and running all the daemons
  2. have hive installed and running
  3. create the hive table
    CREATE EXTERNAL TABLE IF NOT EXISTS restaurants(
    res_id STRING,
    name STRING,
    cuisines STRING,
    zipcode STRING,
    rowid string,
    rating_text STRING,
    user_id STRING,
    rating STRING )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ‘,’
    STORED AS TEXTFILE
    LOCATION ‘/user/hive/restaurants’;
  4. have installed Kafka and started the zookeeper, kafka broker, created the topic and started the producer
    For Kafka single node installation and to starting the servers, you can refer
    https://medium.com/@jaya.aiyappan/install-single-node-kafka-cluster-on-windows-and-test-it-65e24e07d0aa

3. send the following message on the producer terminal

4. start the spark history server (Optional)

  • Edit the %SPARK_HOME%/conf/spark-defaults.conf file to have

spark.eventLog.enabled true
spark.history.fs.logDirectory
file:///C:/tmp/spark-events

  • run the following command from %SPARK_HOME%

bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer

5. run the program and you should get the

--

--