Starting with Apache Storm for Real-Time Data Processing

Published in

We’ve moved to freeCodeCamp.org/news

16 min readMay 23, 2018

Source: https://pxhere.com/en/photo/77064

Continuous data streams are ubiquitous and are becoming even more so with the increasing number of IoT devices being used. Of course, this means huge volumes of data are stored, processed, and analyzed to provide predictive, actionable results.

But petabytes of data take a long time to analyze, even with tools such as Hadoop (as good as MapReduce may be) or Spark (a remedy to the limitations of MapReduce).

Often, we don’t need to deduce patterns over long periods of time. Of the petabytes of incoming data collected over months, at any given moment, we might not need to take into account all of it, just a real-time snapshot. Perhaps we don’t need to know the longest trending hashtag over five years, but just the one right now.

This is what Apache Storm is built for, to accept tons of data coming in extremely fast, possibly from various sources, analyze it, and publish real-time updates to a UI or some other place… without storing any actual data.

This article is not the ultimate guide to Apache Storm, nor is it meant to be. Storm’s pretty huge, and just one long-read probably can’t do it justice anyways. Of course, any additions, feedback or constructive criticism will be greatly appreciated.

Starting with Apache Storm for Real-Time Data Processing

Written by Usama Ashraf