Consuming from a secured Kafka with Spark streaming

We use Cloudera’s CDH for our hadoop servers at work. Up until recently, we’ve been using Spark 1.6 since that was the default Spark version that was included with CDH.

However, we recently had a need to read from a secured Kafka (version 0.10, since prior Kafka versions does not support security) maintained by another team. Thus we had to install Cloudera Spark 2.1, which includes the Spark Streaming integration for Kafka 0.10. Here’s the official link comparing spark streaming with Kafka 0.8 vs Kafka 0.10.

Without further ado, below is how we were able to read from a Kafka that was authenticated using SASL with plaintext. We use scala, and build our jar using sbt assembly.

  1. scala code here
  2. build.sbt here
  3. jass.conf file should look like this:

KafkaClient {
 org.apache.kafka.common.security.plain.PlainLoginModule required
 username=”yourusername"
 password=”yourpassword”;
};

To run the spark submit job:

export SPARK_KAFKA_VERSION=0.10

spark2-submit — files jaas.conf — driver-java-options “-Djava.security.auth.login.config=./jaas.conf” — conf “spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf” — class Spark2Kafka Spark2Kafka-assembly-1.0.jar