Understanding Batch and Stream Processing.

A Guide to Processing Data

Musili Adebayo
Art of Data Engineering
3 min readApr 17, 2024

--

Data-driven decision-making has become the heart of today’s businesses, signifying its critical importance for organizations to make informed decisions. But if these data are not collected, mined, and processed properly, then it has no value. This depicts the importance of data processing.

This is a rectangled shaped illustration of a demonstrating the batch and streamind data processing.
Batch vs Streaming Data Processing — by Musili Adebayo

Lately, data are made available within minutes if not immediately after they are produced. In this article, I explained the type of data processing, use cases, and the differences between them.

So, let’s get started!

There are two ways in which data are processed.

1. The Batch Processing.
2. The Stream Processing.

What is Batch Data Processing?

Batch Processing is ideal for tasks that need to be logged or processed hourly, daily, weekly, or monthly. Traditional analytics data processed with the age of hours, days, months, and years typically fall in this category.

What is Stream Data Processing?

Stream Data Processing is ideal for real-time data streaming involving time-critical analysis. Data that needs to be ingested and processed in real-time to gain insight. These data are typically real-time streaming data that are sourced from various sources and need to be ingested immediately.

Scenarios to use Batch and Stream Processing.

A rectangular shape image showing the flow of data ingestion between batch and stream processing.
Batch vs Stream Processing

Batch Processing Scenarios:

  1. Cars parked at rest: From the picture above, a perfect example is how cars are counted while it’s at rest based on the set tally count. This might be impractical for moving cars as seen in the second picture.
  2. Payroll system: Works well for batch processing of monthly salaries of employees.
  3. Bank statements: These are also processed at the end of the month and can easily be printed sequentially.

Stream Data Processing Scenarios are:

  1. Car counted as they pass: From the picture above, we can see how different cars are counted in real-time as they pass.
  2. Fraud-detecting devices: Banks/Credit Card companies need real-time data to detect fraud attempts. Data processed instantly can allow these institutions to spot suspicious activity on time. This can help avert financial loss and prevent the completion of fraudulent activities.
  3. Medical sensor systems: Sensor devices like heart rate monitors use streaming data reading. The real-time processing of the data is crucial for determining a patient's condition.

The Difference between Batch and Stream Processing.

1. Data Size and Scope: Batch processing is best for handling large volumes of data. It can process all the data while stream processing is ideal for data with few records and it only has access to the most recent data.

2. Performance: Looking at the time taken from when data is collected and made available for processing (Data Latency), Batch processing usually takes a longer time to process. In contrast, stream processing happens within a few milliseconds.

3. Required Hardware: Sequential data handling in batch processing requires a lot more resources and a storage system to handle the large data ingestion. Meanwhile, Stream Processing allows data to be processed in real-time with no need for storage for later processing so a less powerful hardware would do the job.

4. Data Analysis: Batch processing is typically used for complex analytical computation. Stream processing uses straightforward computation, aggregation, and reporting.

Conclusion.

Which data processing method is the best?
The best approach to selecting the best data processing method depends on the task. If your task requires data to be available in a continuous flow then stream processing can do the job. On the other hand, if you are handling large volumes of data that use similar sequential processing, batch processing is necessary. I trust that this article has given you insight into which data processing methodology best serves your needs.

Thank you for reading.

Remember to subscribe and follow me for more insightful articles and you can reach me via my socials below if you want to discuss further.
LinkedIn: Musili Adebayo
Twitter: Musili_Adebayo

P.S. Enjoy this article? Support my work directly.

--

--

Musili Adebayo
Art of Data Engineering

I am quite the storyteller, once a good idea pops up in my head. I enjoy talking about modern data stack and how to leverage them.