Slurm Workload Manager— HPC on GCP

Published in

Google Cloud - Community

4 min readJan 31, 2023

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. As a cluster workload manager, Slurm has three key functions.

It allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work.
It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes.
Finally, it arbitrates contention for resources by managing a queue of pending work.

Architecture

Slurm consists of a slurmd daemon running on each compute node and a central slurmctld daemon running on a management node (with optional fail-over twin). The slurmd daemons provide fault-tolerant hierarchical communications.

The entities managed by these Slurm daemons include nodes, the compute resource in Slurm, partitions, which group nodes into logical sets, jobs, or allocations of resources assigned to a user for a specified amount of time, and job steps, which are sets of (possibly parallel) tasks within a job. The partitions can be considered job queues, each of which has an assortment of constraints such as job size limit, job time limit, users permitted to use it, etc. Priority-ordered jobs are allocated nodes within a partition until the resources (nodes, processors, memory, etc.) within that partition are exhausted. Once a job is assigned a set of nodes, the user is able to initiate parallel work in the form of job steps in any configuration within the allocation. For instance, a single job step may be started that utilizes all nodes allocated to the job, or several job steps may independently use a portion of the allocation.

Slurm on Google Cloud Platform

slurm-gcp is an open-source software solution that enables setting up Slurm clusters on Google Cloud Platform with ease. With it, you can create and manage Slurm cluster infrastructure in GCP, deployed in different configurations.

Cluster Configurations

slurm-gcp can be deployed and used in different configurations and methods to meet your computing needs.

Cloud

All Slurm cluster resources will exist in the cloud. There are two deployment methods for cloud cluster management:

GCP Marketplace

This deployment method leverages GCP Marketplace to make setting up clusters a breeze without leaving your browser. While this method is simpler and less flexible, it is great for exploring what slurm-gcp is!

Terraform

This deployment method leverages Terraform to deploy and manage cluster infrastructure. While this method can be more complex, it is a robust option. slurm-gcp provides terraform modules that enables you to create a Slurm cluster with ease.

Hybrid

Only Slurm compute nodes will exist in the cloud. The Slurm controller and other Slurm components will remain in the onprem environment.

Multi-Cluster/Federation

Two or more clusters are connected, allowing for jobs to be submitted from and ran on different clusters. This can be a mix between onprem and cloud clusters.

HPC blueprints

An HPC blueprint is a YAML file that defines a reusable configuration and describes the specific HPC environment that you want to deploy using Cloud HPC Toolkit.

To configure your environment, you can either start with one of the Example HPC blueprints which you can modify, or create your own blueprint.

Example HPC blueprints

To get started, you can use one of the following example HPC blueprints.

Example 1: Deploys a basic HPC cluster with Slurm
Example 2: Deploys an HPC cluster with Slurm and a tiered filesystem

For a full list of example HPC blueprints, see the Cloud HPC Toolkit GitHub repository.

Please refer to this guide for deploying HPC clusters with Slurm on Google Cloud Platform.

https://cloud.google.com/hpc-toolkit/docs/quickstarts/slurm-cluster.

Best practices for running HPC workloads

Use the compute-optimized machine type

GCP recommends that you use the compute optimized machine family (C2 or C2D). Virtual machine (VM) instances created with this machine type have a fixed virtual-to-physical core mapping and expose NUMA cell architecture to the guest OS, both of which are critical for performance of tightly-coupled HPC applications.

Use compact placement policies

To reduce internode latency, VM instance placement policies enable control over the placement of VMs in Google Cloud data centers. Compact placement policies are recommended as they provide lower-latency communication within a single zone.

Use the HPC VM image

We recommend that you use the HPC VM image, which incorporates best practices for running HPC applications on Google Cloud. This image is based on CentOS 7.9 and is available at no additional cost through Google Cloud Marketplace.

Configure file system tunings

Following are the primary storage choices for tightly-coupled applications. Each choice has its own cost, performance profile, APIs, and consistency semantics.

NFS-based solutions such as Filestore and NetApp Cloud Volumes can be used for deploying shared storage options. Both Filestore and NetApp Cloud Volumes are fully managed on Google Cloud, and we recommend that you use them when your application does not have extreme I/O requirements to a single dataset.
POSIX-based parallel file systems are more commonly used by MPI applications. POSIX-based options include open-source Lustre and the fully-supported Lustre offering, DDN Storage EXAScaler Cloud.
Intel DAOS is another option supported by the Cloud HPC Toolkit. DAOS is a performant option for an ephemeral scratch file-system. This option requires some additional setup, including creating custom compute VM images.