Data Stream Processing for Newbies with Kafka, KSQL, and Postgres

A fully-reproducible, Dockerized, step-by-step tutorial for dipping your toes into the data stream.

Maria Patterson
High Alpha

--

This post on data stream processing was originally published on HighAlpha.com.

Let’s get streamin’.

With today’s terabyte and petabyte scale datasets and an uptick in demand for real-time analytics, traditional batch-oriented data processing doesn’t suffice. To keep up with the flow of incoming data, organizations are increasingly moving towards stream processing.

In batch processing, data are collected over a given period of time. These data are processed non-sequentially as a bounded unit, or batch, and pushed into an analytics system that periodically executes.

In stream processing, data are continuously processed as new data become available for analyzing. These data are processed sequentially as an unbounded stream and may be pulled in by a “listening” analytics system.

Though stream processing does not need to be real-time, it can enable data processing and analysis used by many real-time systems that may require fast reactions to incoming data — log or clickstream monitoring, financial analysis on a trading floor, or sensor data from an Internet of Things (IOT) device, for example.

--

--

Maria Patterson
High Alpha

Filipina data doofus, Astro PhD, Cubs fan, distance runner, dachshund enthusiast. Have code will Docker. Follow me at https://twitter.com/OpenSciPinay