Learn about Slurm Workload Manager (Part 1)

TinhTran
3 min readMay 25, 2023

--

What is a cluster?

Abbreviated as computer clusters, clusters connect low-configured computers together (instead of using a single supercomputer). Each computer is called a node. There are two types of nodes: head node and compute node.

Generalized architecture of a typical Servercluster
Generalized architecture of a typical Servercluster

Terminology

Head Node: This is where users log in to the cluster, edit scripts, compile code, and submit jobs to the scheduler. The head node is shared among multiple users, so jobs should not be run on the head node (as mentioned in rule 10–10 below).

Compute Node: This is where jobs are executed, but to run on these nodes, we need to go through the job scheduler by submitting the job. The job will automatically run on this node when the resource requirements are met.

Cores: They can be understood as the number of CPU cores. They handle and execute a significant portion of the required tasks.

How Do HPC Clusters Work?

To run a program on a cluster, you need to prepare the following files on the head node:

1 Program running code (e.g., .py file)

2 A SLURM script file that specifies the required resources. For example, memory, number of CPUs, number of nodes, etc.

When the SLURM file is submitted, the scheduler checks if the resource requirements are met. If they are, the job will be executed; otherwise, it will be placed in a queue.

Important Notes on Using HPC Clusters

The 10–10 Rule:

An important thing to note is that you can use quick commands on the head node, but:

  • Not for more than 10 minutes.
  • Do not use more than 10% of cores and memory.

Exceeding these limits will affect others.

There are several ways to check the information of the head node, such as:

$ uname -a (display system information)
$ lscpu (display CPU-related information, number of cores, speed, etc.)
$ free -m (display memory)
$ df -h (check storage disk)
$ top (display running information, CPU, memory, etc.)

For each command, you can search for more details.

No Internet Access on the Compute Nodes

This may happen on some compute nodes (HPC) due to security reasons. You cannot download data or packages. If you encounter issues downloading large data using Slurm, you can modify the Slurm file to switch to another node.

Introducing Slurm

On all cluster systems, users run programs by submitting scripts to the Slurm job scheduler. A Slurm script must contain three things:

1 Specify resources for the job

2 Set the environment

3 Specify the tasks to be performed under the shell command

Here is a sample Slurm script to run Python code using a Conda environment.

  #!/bin/bash
#SBATCH --job-name=myjob # create a short name for your job
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=2G # memory per cpu-core (4G is default)
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
#SBATCH --mail-type=begin # send email when job begins
#SBATCH --mail-type=end # send email when job ends
#SBATCH --mail-user=<YourNetID>@gmail.com

module purge
module load anaconda3/2020.11
conda activate pytools-env
pthon myscript.py

The first line of the script specifies the Unix shell being used. Following that are lines starting with #SBATCH. The script mentioned above requests 1 CPU core and 4 GB of memory for a runtime of 1 minute. The next three lines specify the required environment. The last line indicates that the program will execute the commands from the myscript.py file.

If your job fails before the specified time limit, it will be terminated to free up resources for other running jobs. It is advisable to accurately estimate the runtime and add a buffer of 20%. A job script named job.slurm can be submitted to the Slurm scheduler using the sbatch command.

$ sbatch job.slurm

To check the status of your job, use the following command:

$ squeue -u <YourNetID>

To view the estimated start time of pending jobs, use:

$ squeue -u <YourNetID>

--

--