Processing Streaming Twitter Data using Kafka and Spark — The Plan

Dhoomil Sheta
Nov 4, 2018 · 2 min read

What is Apache Kafka?

Apache Kafka is a publish/subscribe messaging system. It is often described as a “distributed commit log” or more recently as a “distributed streaming platform.” Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged streaming platform

Source: https://kafka.apache.org/images/kafka_diagram.png

The Inspiration

I recently read the book Kafka: The Definitive Guide by the creators of Kafka. It is truly a wonderful book for anyone who wants to start developing applications with Kafka as well as anyone who wants to know the internals of such a unique platform which is used by most of the Fortune 500 companies.

The Plan

In this series, I’ll be exploring various aspects of Apache Kafka, all by implementing cool data pipeline:

  1. We’ll start by setting up a Kafka Cluster in cloud/locally

Let’s begin!

D.B.S

Specialist Programmer at Infosys. I like to learn and share

Dhoomil Sheta

Written by

Android, Big Data, ML/DL, dhoomilsheta.ml

D.B.S

D.B.S

Specialist Programmer at Infosys. I like to learn and share

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade