Flink State Backends

Sruthi Sree Kumar
Big Data Processing
2 min readFeb 18, 2020

Apache Flink provides the following three backends to store application state information.

  1. Memory State Backend
  2. File System State Backend
  3. RocksDB State Backend

MemoryStateBackend

The memory state backend persists the data as objects in the task manager’s heap memory. Upon checkpointing, this state backend will snapshot the state and the snapshot is stored in the Job managers heap memory. This is the default backend used by Flink in case nothing is configured. This is the default state backend in Flink. The MemoryStateBackend can be configured to use asynchronous snapshots and it is enabled by default.

The MemoryStateBackend is mainly used for local development and debugging. A reason for it is that the snapshots are saved in the job manager which can create unnecessary overhead. It is encouraged to use this backend for the jobs which hold little state.

One of the limitations of MemoryStateBackend is the size of the state. By default, the size of the individual state is limited to 5 MB. This value can be increased in the constructor of the MemoryStateBackend. Irrespective of this configuration, maximum size cannot exceed the Akka (Flink uses Akka for RPC between components (JobManager/TaskManager/ResourceManager)) frame size. Another limitation is that the aggregate state must fit into the Job Manager memory.

FsStateBackend

Similar to MemoryStateBackend, FsStateBackend also persists the inflight data in task manager’s heap memory. Upon checkpointing, the snapshot data is saved to a persistent file system such as HDFS, S3 or task manager’s local filesystem. Minimal metadata is stored in the JobManager’s memory.

The persistent storage filesystem URL can be configured which instantiating the state backend. By default, FsStateBackend also uses asynchronous snapshots and can be disabled by passing a boolean flag in the constructor.

The FsStateBackend is mainly used in high availability systems and it is encouraged to use for jobs with large state, long windows, large key/value states.

RocksDBStateBackend

Unlike the other state backends, RocksDBStateBackend stores the inflight state information in a RocksDB database. RocksDB is an embeddable key-value store which offers ACID guarantees. Each task manager maintains its own Rocks DB file ( in the TaskManager data directories). Upon checkpointing, the checkpointed data will be saved into the configured persistent file system such as HDFS. Minimal metadata is stored in the JobManager’s memory.

The RocksDBStateBackend always performs asynchronous snapshots. And this is the only state backend which supports incremental checkpointing.

Unlike the other state backends, the amount of state that you can keep is only limited by the amount of disk space available. This allows you to keep a very large state. Hence, RocksDBStateBackend can be used for jobs with very large state, long windows, large key/value states. All the state objects stored in RocksDB are serialized which makes the state read/write operation expensive compared to a read/write from the heap. This will also reduce throughput. Another limitation is that the JNI bridge API of RocksDB is based on byte array and maximum supported size per key and per value is 2³¹ bytes (2 GB) each.

References:

https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html

--

--