YARN — Resource scheduler/Cluster Manager for Hadoop platform — in a nutshell

Meenakshi Sundaram Sekar
2 min readApr 22, 2018

--

YARN Schedules and orchestrates applications and tasks in Hadoop platform. When tasks to be run need data from HDFS, YARN will attempt to schedule the tasks on node where the data resides (applying the concept of data locality).

YARN is the Hadoop second generation data processing platform and the first of which is called the MapReduce v1 or MR1. MR1 was a scheduling platform built for processing Hadoop MapReduce workloads.MR1 was very effective in processing Map and Reduce workloads and achieving the data locality wherever possible.

MR1 has its own shortcomings and YARN was able to solve the shortcomings faced by the MR1 platform. Some of the shortcomings are explained below.

MR1 was not intended to schedule and manage non-MapReduce programs. MR1 provides strict framework for scheduling map/reduce jobs on a Hadoop cluster and was not able to manage other applications like spark, impala and Tez and SQL on Hadoop projects.

Not flexible enough to use all processing assets which went underutilized. MR1 had strict definition of processing slots for map and reduce operations which led to underutilization of resources in the cluster.

Its practical upper limit to scalability.

Running an Application on YARN

YARN is designed to distribute an applications workload across the multiple workers daemons or processes called NodeManagers.

NodeManagers are the worker nodes or agents responsible for carrying out tasks, the complete set of which comprise an application.

A YARN daemon called ResourceManager is responsible for assigning an ApplicationMaster, a delegate process for managing the status and execution of an application.

In addition, ResourceManager monitors, governs and reserves the compute resources in the nodemanagers (CPU cores and memory). Compute and memory resources are presented to applications to perform tasks attempts in processing units called containers.

The ApplicationMaster determines the container requirements for an application and negotiates the resources with the ResourceManager, hence the name Yet Another Resource Negotiator or YARN.

Below are the sequence of steps happens when user submits an application in Hadoop platform with YARN as Resource scheduler/Cluster manager.

1. The client/user submits an application to the YARN ResourceManager.

2. The ResourceManager designates an ApplicationMaster on a NodeManager with sufficient capacity to be assigned this role.

3. The ApplicationMaster negotiates the tasks containers on the NodeManagers (Including the NodeManager the Application master is running).

Once the ResourceManager allocates the containers, the ApplicationMaster dispatches processing to the NodeManagers hosting the task containers for the application.

4.The NodeManager report their task attempt status and progress to the ApplicationMaster.

5. The Application master in turn reports the status and progress of the application to the Resourcemanager.

6, ResourceManger reports application progress, status and results to the client.

Thanks for your reading and time ! Have a great day.

--

--

Meenakshi Sundaram Sekar

Senior AWS Cloud ETL Developer — Distributed data processing — Spark,Mapreduce and Hadoop ecosystems.