Flink Architecture — Job manager, Task manager and Job client

M Haseeb Asif
Big Data Processing
2 min readJul 13, 2020

--

Apache Flink is a distributed stream processing engine. It does use the Akka framework for it’s distributed processing. Akka is an actor based approach where each actor is considered independent and they communicate with each other asynchronously. Flink has mainly three distributed components - Job Manager, Task Manager and Job client. Job client submits the Flink job to the Job manager. Job manager then orchestrates jobs on different task managers and manage the resources. Task Managers are the actual worker nodes doing computations on the data and updating the job manager about their progress.

Apache Flink does use something similar to master-slave architecture. It has a job manager acting as a master while task managers are worker or slave nodes. At the start, Task managers send a register message to the Job manager and get an acknowledgement of the registration. Job manager maintains the information about different resources on task managers as well as the state of the distributed task need to be executed. It uses that information to allocate resources to different tasks.

A Flink program, or Flink Job, comprises of the multiple tasks. A task is a basic unit of execution in Apache Flink. Each operator, Map or Reduce, will have multiple instances depending upon the level of parallelism. Each instance will be executed as a task on the task manager.

Job manager schedules these tasks on different task managers. Execution resources are defined through task…

--

--

M Haseeb Asif
Big Data Processing

Technical writer, teacher and passionate data engineer. Love to talk, write and code with Apache Spark, Flink or anything related to data