Unify Batch and Stream Processing with Grainite

Single data pipeline to handle both batch and stream processing

Kiran Rao
Grainite
3 min readFeb 7, 2023

--

Businesses are upgrading their data pipelines to support real-time stream processing applications. This is required to drive improved decision-making and customer success. However, many data pipelines are still required to deal with legacy batch data in tandem with this streaming data.

A quick primer on batch and stream processing

Batch processing happens on large datasets, and requires complex computation over a longer time frame. Applications built on batched data can tolerate latencies in minutes, hours, or even days.

On the other hand, stream processing happens on smaller datasets as data arrives continuously from multiple sources. It may or may not require complex computation. Stream processing applications have much tighter latency tolerance requirements usually measured in single-digit milliseconds.

The inefficient approach of current architectures

Traditionally, engineering teams have built separate batch and stream-processing data pipelines to meet business requirements. This leads to many issues. Some of which are:

  • Duplication of infrastructure resources
  • Introduction of data silos
  • Inability to meet SLA requirements
  • The added complexity of development and operations.

Some commercial solutions try to unify batch and stream processing. The steps include collecting the data from structured and unstructured sources, preparing the data (enrichment, filtering, and fusion), processing the data(lookups, aggregations, workflows, and business rules), and updating the transformed state to downstream components such as databases, data warehouses, and data lakes which then serves downstream applications.

However, these solutions introduce their own set of challenges:

  • They integrate too many software components with multiple stages that lead to operational inefficiencies.
  • They have limited data transformation capabilities.
  • Meeting the latency, performance, and scalability requirements of modern real-time applications is a challenge.

Grainite’s Approach — Convergence of capabilities

As a converged streaming application platform, Grainite integrates stream and batch data ingestion, data transformation (processing), state storage, and serving layers into a single Kubernetes-managed cluster. Developers can build applications that transform both batch and streaming data within the same data pipeline while easily meeting latency, scalability, and performance requirements.

Grainite supports gRPC and Kafka-compatible APIs which allow any client to stream events (push) directly into Grainite topics. Stream and batch data can be ingested into Grainite topics (pull) using Tasks, that poll various sources of data and ingest them into Grainite Topics.

Grainite integrates stream and batch data ingestion, data transformation (processing), state storage, and serving layers into a single Kubernetes-managed cluster.

Unlike other messaging platforms, Grainite automatically manages partitions on a per-key basis greatly simplifying both the ingestion and consumption of data. Grainite processes events in order for events belonging to the same key and out of order for events belonging to different keys with the highest level of parallelism. This helps to easily meet the performance and latency goals of real-time applications.

To summarize:

  • Grainite offers the easiest way to ingest batch and streaming data without the need for intermediary components. One can develop business logic, that transforms ingested data, using Grainite’s simplified APIs available in Java and Python programming languages.
  • Store the state in Grainite and directly serve downstream applications without the need for additional databases, data warehouses, or data lakes.
  • Build greenfield applications or integrate Grainite into existing data pipelines to solve critical challenges.
  • Enable a cloud-agnostic deployment architecture to support your infrastructure across all major public clouds as well as private deployments.

Business outcomes with Grainite

  • Faster Time to Data — By enhancing existing applications with new use cases, or developing new greenfield applications.
  • Reduced Total Cost of Operations — Both in terms of cloud resource costs as well as operational costs.
  • Deliver more with less: By improving the productivity of teams, freeing them to provide more value to your internal and external customers.

Learn more about Grainite technology on our website.

--

--