Windowing in Flink

M Haseeb Asif
Big Data Processing
4 min readMar 4, 2020

--

Windowing is a crucial concept in stream processing frameworks or when we are dealing with an infinite amount of data. In batch processing, since we have finite data so we can apply the computation on it altogether but with stream processing incoming data is unbounded. Windowing is an approach to break the data stream into mini-batches or finite streams to apply different transformations on it.

Flink window opens when the first data element arrives and closes when it meets our criteria to close a window. It can be based on time, count of messages or a more complex condition. There are different types of windowing strategies — Tumbling, Sliding, Session and Global windows. Additionally, you can create your own complex implementation other than the predefined ones.

Before we write code for windowing, we need to tell Flink that what do we mean by time while we are defining windows. Is it based on the system time, actual event time or ingestion time. Setting it as processing time means we want to use the processing time of machine. Event time is the time when the event actually occurred and usually, it’s part of each data point. Finally, Ingestion time means the time when an event gets ingested or entered into the Flink processing system.

env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);

Now we will discuss the different type of windows with examples. We assume a data stream of string and Integer pairs e.g. (a,10), (b,20). We will apply different type of windows operation on our data…

--

--

M Haseeb Asif
Big Data Processing

Technical writer, teacher and passionate data engineer. Love to talk, write and code with Apache Spark, Flink or anything related to data