Apache Beam — windowing functions

Neeraj Sabharwal
Apr 22 · 2 min read
  • Fixed Time Windows

The simplest form of windowing is using fixed time windows: given a timestamped PCollection which might be continuously updating, each window might capture (for example) all elements with timestamps that fall into a five-minute interval.

  • Sliding Time Windows

A sliding time window also represents time intervals in the data stream; however, sliding time windows can overlap. For example, each window might capture five minutes worth of data, but a new window starts every ten seconds. The frequency with which sliding windows begin is called the period. Therefore, our example would have a window duration of five minutes and a period of ten seconds.

  • Per-Session Windows

A session window function defines windows that contain elements that are within a certain gap duration of another element. Session windowing applies on a per-key basis and is useful for data that is irregularly distributed with respect to time. For example, a data stream representing user mouse activity may have long periods of idle time interspersed with high concentrations of clicks. If data arrives after the minimum specified gap duration time, this initiates the start of a new window.

  • Single Global Window

By default, all data in a PCollection is assigned to the single global window, and late data is discarded. If your data set is of a fixed size, you can use the global window default for your PCollection.

  • Calendar-based Windows (not supported by the Beam SDK for Python)