The Kappa Architecture

Devin Bost
4 min readFeb 22, 2019

--

What is the Kappa Architecture?

Kappa Architecture is an emerging paradigm representing a high-level software pattern of leveraging streaming data. Kappa Architecture depends upon event streams, or flows of real-time user interaction and device-generated data, to communicate changes to a centralized and immutable (unchangeable) log that represents a single source of truth of all observed data. Kappa Architecture utilizes stream processing engines, or high-performance technologies that continuously process event streams’ data (in real-time), to apply decision logic and provide information to persons and software applications. (See Figure 1.)

Why is Kappa Architecture important?

Kappa Architecture is important because it dramatically improves maintainability and performance of software and web applications, especially applications that have complex data requirements.

What is event streaming / streaming data?

Event streaming is the practice of capturing a packet of context-rich, or self-descriptive, data every time an action or choice is made by the user. Examples of events are:

· Teacher purchasing school supplies from e-commerce website

· Person opening email in their web browser

· Student completing graded activity in their educational software. (See Figure 2.)

Why use a stream processing engine?

Stream processing engines are used to clean, enrich, transform, filter, and aggregate streaming events. This processing enables low-cost and high-performance data-driven software applications, visualizations, and reports.

Cleaning

Cleaning is used to remove malformed or undesirable data from the event stream. Cleaning may be required when low performing networks or software bugs result in software applications sending duplicate or malformed data to the event stream.

Enrichment

Enrichment is used to add additional information, or context, to events coming through the event stream, typically to simplify analytics. Enrichment, for example, may be used to attach (to the event) additional information about the user, the user’s historical activity, or status updates that relate to the user.

Transformation

Transformation may be used to apply simple calculations or restructure information to improve the ease, performance, or simplicity of analytics. Transformation may also be required to correct data formatting issues.

Filtering

Filtering is used to remove unnecessary data from a stream. Filtering is often used to improve performance of subsequent processing steps by reducing the volume of data that needs to be processed.

Aggregation

Aggregation is used to obtain summary information. By aggregating data with streaming analytics, rather than batch analytics, considerable performance gains can be achieved because streaming analytics eliminates wasteful data processing. For example, in the traditional approach of using a relational database to store data, aggregations may require the use of expensive join and grouping operations every time the data is requested. In a high-performance client-facing application, such data could easily be requested thousands of times per second, resulting in enormous computational waste. By performing the aggregation with a stream processing engine, the expensive computation only needs to be performed exactly once at the time the data is received. Then, when the client-facing application requests the data, the application responsible for storing the aggregate data needs only to provide the summary information that was already computed. This approach of utilizing preprocessing, a type of computation that occurs as the data arrives (rather than when it’s requested), can improve performance by many orders of magnitude, especially for computations that are typically very slow or expensive. (See Figure 3.)

Visualization

Kappa Architecture accelerates performance and development time of visualizations by providing preprocessed data that has been curated, or prepared, for use in visualizations. When data must be grouped in different ways (e.g. by classroom, by school, by district, by state, etc.), the performance gains are multiplied (compared to batch analytics). These performance gains are also very useful when the end-user desires to see summary information over a custom date range. (See Figure 4.)

Conclusion

Kappa Architecture[1] utilizes event streaming, stream processing, and stream analytics to replace traditional patterns of data access, data retrieval, and application design. Kappa Architecture eliminates many of the performance and maintainability problems associated with legacy data architectures, especially when analytics are involved.

Appendix

[1] For video examples and additional information about Kappa Architecture, please see: http://milinda.pathirage.org/kappa-architecture.com/

--

--

Devin Bost

Devin Bost is a data engineer in Orem, Utah specializing in streaming analytics, machine learning, and artificial intelligence.