Processing Streaming Twitter Data using Kafka and Spark — The Plan

Dhoomil Sheta
Nov 4, 2018 · 2 min read

What is Apache Kafka?

Apache Kafka is a publish/subscribe messaging system. It is often described as a “distributed commit log” or more recently as a “distributed streaming platform.” Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged streaming platform

Image for post
Image for post

The Inspiration

I recently read the book Kafka: The Definitive Guide by the creators of Kafka. It is truly a wonderful book for anyone who wants to start developing applications with Kafka as well as anyone who wants to know the internals of such a unique platform which is used by most of the Fortune 500 companies.

The Plan

In this series, I’ll be exploring various aspects of Apache Kafka, all by implementing cool data pipeline:

  1. We’ll start by setting up a Kafka Cluster in cloud/locally

Let’s begin!

Image for post
Image for post


Specialist Programmer at Infosys. I like to learn and share

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store