Apache Flink Windowing: The Ready to Use Time-Based Data Processing Template

Vineeth R
NeST Digital
Published in
6 min readAug 18, 2023

With the arrival of social media, the engineering behind technologies like big data processing and related fields got its attention from many tech giants. The developers were always struggled to handle and analyze data-streams in time-based manner. Apache Flink has emerged as a powerful and versatile framework for big data processing. It provides ready to use template named windowing which helps to process streams of data in time-based manner enabling efficient data processing and advanced analytics. It is becoming a crucial tool for real-time data processing and streaming applications in various domains such as finance, e-commerce, healthcare IoT and more. In this blog, we will dive little deep into Apache Flink Windowing and its concepts.

Flink Architecture

What is windowing?

Windowing is the process of dividing data streams into buckets of finite size known as windows based on specific criteria such as time or event count. This division allows computations and analyses to be performed on each window independently. A window is created as soon as the first element that should belong to this window arrives, and the window is completely removed when the time (event or processing time) passes its end timestamp plus the user-specified allowed lateness. Each window will have ‘function’ which will contain computational logic to be applied on the elements of the window and a ‘trigger’ which tells the window when to start applying the function. There are ‘evictors’ which helps to remove elements from the window once the trigger fires and before/or after function is applied.

Types of Windows

  1. Tumbling windows: Tumbling windows divide the data stream into fixed-sized non-overlapping windows. Each window represents a distinct interval of time. It is useful when we need to analyze data in discrete time periods.

Java code snippet

DataStream<T> input = …;
// tumbling event-time windows
input
.keyBy(<key selector>)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.<windowed transformation>(<window function>);
// tumbling processing-time windows
input
.keyBy(<key selector>)
.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
.<windowed transformation>(<window function>);
// daily tumbling event-time windows offset by -8 hours.
input
.keyBy(<key selector>)
.window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8)))
.<windowed transformation>(<window function>);

2. Sliding windows: In contrast to tumbling windows, the elements in sliding window can overlap. As the name suggests, it slides over the data stream over regular intervals allowing the data to be part of multiple windows. This helps getting relationships and patterns that span multiple windows.

Java code snippet

DataStream<T> input = …;
// sliding event-time windows
input
.keyBy(<key selector>)
.window(SlidingEventTimeWindows.of(Time.seconds(10), Time.seconds(5)))
.<windowed transformation>(<window function>);
// sliding processing-time windows
input
.keyBy(<key selector>)
.window(SlidingProcessingTimeWindows.of(Time.seconds(10), Time.seconds(5)))
.<windowed transformation>(<window function>);
// sliding processing-time windows offset by -8 hours
input
.keyBy(<key selector>)
.window(SlidingProcessingTimeWindows.of(Time.hours(12), Time.hours(1), Time.hours(-8)))
.<windowed transformation>(<window function>);

3. Session windows: Session windows divides data from stream based on gaps between events or periods of inactivity. It is useful when sessions are important such as data between user login and logout.

Java code snippet

DataStream<T> input = …;
// event-time session windows with static gap
input
.keyBy(<key selector>)
.window(EventTimeSessionWindows.withGap(Time.minutes(10)))
.<windowed transformation>(<window function>);
// event-time session windows with dynamic gap
input
.keyBy(<key selector>)
.window(EventTimeSessionWindows.withDynamicGap((element) -> {
// determine and return session gap
}))
.<windowed transformation>(<window function>);
// processing-time session windows with static gap
input
.keyBy(<key selector>)
.window(ProcessingTimeSessionWindows.withGap(Time.minutes(10)))
.<windowed transformation>(<window function>);
// processing-time session windows with dynamic gap
input
.keyBy(<key selector>)
.window(ProcessingTimeSessionWindows.withDynamicGap((element) -> {
// determine and return session gap
}))
.<windowed transformation>(<window function>);

4. Global windows: Global windows groups data based on key making it not having a natural end. So, if you want to perform a computation on the data, we need to specify a trigger as well.

Java code snippet

DataStream<T> input = …;
input
.keyBy(<key selector>)
.window(GlobalWindows.create())
.<windowed transformation>(<window function>);

Sample input output data

Consider the example of a traffic sensor that counts every 15 seconds the number of vehicles passing a certain location. The resulting stream could look like:

Number of cars that pass the location every minute using tumbling windows.

Compute every thirty seconds the number of cars passed in the last minute using sliding windows

Number of cars passed on a particular direction when signal is green. Consider columns marked in red means that the signal is red and hence no cars are allowed to pass.

Some Use Cases

1. Real-time data analysis: Windowing can be used for real-time analysis on continuous streams of data. Windowing will divide the streams into windows on which we can perform analytical computations in real-time. This approach can be adapted where we need to check for the performance, detect behavior change by continuously monitoring data.

2. Fraud detection: With the help of sliding window which has access to historical data, it ill be easy to find pattern and behavioral changes in each window. Combining this with stateful operations within each window can help maintain fraud detection models.

3. Financial analysis: By processing time-series data with the help of windowing, we can make financial analytic applications like risk assessment, behavior change etc. lot easier. Again, since sliding window can access historical data, the computations performed on this type of window can provide accurate and up-to-date analysis.

4. IoT data processing: Windowing can be used for processing sensor data from IoT devices in real time. With the use of proper windowing model, we can process data from IoT devices for applications like data monitoring, anomaly detection etc.

Similarly, a number of use cases of windowing can be identified various domains like Healthcare, automotive, research, finance, market study, supply chain management, e-commerce, etc.

Benefits of Windowing in Apache Flink

1. Flink supports from simple analytics and calculations like sum, average, count etc. to complex operations like AI algorithms, pattern detection, etc.

2. Support for event time processing enables flink to handle out-of-order events and late arrivals which leads to accurate results.

3. Since flink performs computations on subsets of data, the processing load will be less which leads reduces resource usage and better processing times.

4. Stateful operations support from flink helps its users to perform complex calculations that require context and historical data.

5. Availability in multiple platforms like Java, Scala and Python.

6. Can be deployed on Kubernates or YARN or even stand-alone cluster with minimum infra.

7. Can be configured with high availability.

Alternatives

Conclusion

Apache flink windowing is a powerful tool which can be used for analyze and process data streams in time-based manner. It is flexible efficient tool that provides built in support with various data processing functions making it simple to use. By leveraging its advanced capabilities we can make it use for real-time data analysis and decision making in various domains.

References · https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/windows

--

--