What is the difference between Lambda and Kappa Architectures?

Frank Adams
2 min readJul 1, 2023

--

Lambda and Kappa are two data architectures proposed to handle the movement of large amounts of data for reliable online access.

Lambda has been the more popular architecture and continues to dominate. However, with the increasing accessibility of stream processing, Kappa is gaining attention and is expected to become more prevalent in the near future. Let’s explore their differences.

Lambda Architecture

  1. The ingestion layer collects raw data and duplicates it for separate real-time and batch processing.
  2. It consists of three main layers:
  • Speed or Stream: Raw data arrives in real time and is processed by a stream processing framework (e.g., Flink). The processed data is then passed to the serving layer to create real-time views, enabling low-latency access to near real-time data.
  • Batch: Batch ETL jobs, powered by batch processing frameworks (e.g., Spark), process the raw data to create reliable batch views for offline access to historical data.
  • Serving: This layer exposes the processed data to end-users. Real-time views provide access to the latest data, and batch views can be combined for a complete historical perspective.

Processing code is duplicated for different technologies in the batch and speed layers, leading to logic divergence. Compute resources are also duplicated, and managing two infrastructures is necessary. However, the distributed batch storage ensures reliability and scalability, allowing easy recovery in case of system crashes.

Kappa Architecture

Kappa treats both batch and real-time workloads as stream processing problems. It uses the speed layer solely to prepare data for real-time and batch access. The architecture comprises two main layers:

  • Speed or Stream: Similar to Lambda, this layer often includes tiered storage, where all incoming data is stored indefinitely in different storage layers, such as S3 or GCS for historical data and on-disk logs for hot data.
  • Serving: This layer is the same as in Lambda, but the key difference is that transformations performed in the speed layer are not duplicated in the batch layer. Some complex transformations, like complex joins, may be pushed to the batch storage for implementation.

Kappa architecture requires strong skills in stream processing. Data is processed once using a single stream processing engine, eliminating the need to manage multiple infrastructures.

--

--