I had a dream, I got every GPU I wanted

Gábor Samu
IBM Data Science in Practice
5 min readApr 16, 2021
a person lying in a hammock in a wooded area using a laptop

It’s 2021. If you’re like me, you’ve had a very hard time finding a GPU for your latest computing project. Whether it’s for a gaming rig or a system for AI, we’ve all heard about the chip shortage in the news of late and the impact it’s causing to industries worldwide. This situation has given me a new perspective on the value of our computing resources and the importance of using what we have effectively.

In the world of high-performance computing, large scale clusters can contain thousands of servers, oftentimes with GPUs. These environments provide the horsepower to drive innovation for the benefit of humankind. As an example, high-performance computing has played a pivotal role in the rapid development of vaccines for COVID-19. As the pandemic has highlighted, rapid solutions require all hands on deck — both human and computing. When time to solution matters, organizations need to make certain they’re getting the most out of their compute infrastructure. Here, we’ll discuss methods to drive utilization of the latest GPUs for better return on investment.

Ask and you shall receive — a GPU

Job schedulers are an essential component in high-performance computing clusters. You can think of a job scheduler as a traffic police for the work that users are submitting into the cluster. Traffic police keep traffic flowing when things are busy. They help make certain that you can get to your destination as quickly as possible and help redirect traffic when accidents occur. On those occasions when there is a VIP in town, they help ensure that priority is given to the travel of the VIP. As the virtual traffic police, job schedulers dispatch work to the best server(s) in the HPC cluster based upon the conditions and the users’ request. The job scheduler makes certain that users’ work doesn’t step on each other’s virtual toes. And much like with VIPs, job schedulers need to be able to provide priority access to resources when required for those very important jobs. Finally, in modern HPC environments where heterogeneous computing has become the norm, job schedulers need to manage the flow of jobs to both CPUs and GPUs.

In the early days of GPU-accelerated computing, it was enough for a job scheduler to know if a machine was equipped with a GPU or not. As time passed, systems with multiple GPUs such as the IBM Power Systems AC922, which is equipped with 4 NVIDIA V100 GPUs, became commonplace. As the complexity of the systems increase, so do the demands on job schedulers. Job schedulers not only have to keep track of which servers have GPUs, but also need to ensure that GPUs are not oversubscribed. The latest incarnation of a compute GPU from NVIDIA is the Ampere A100. Alongside the A100, the Multi-Instance GPU (MIG) capability was introduced by NVIDIA for the Ampere series GPUs. With MIG, the A100 can be partitioned into smaller GPUs (up to 7) in a secure manner. In simple terms, I equate MIG with the ability to order a slice of pizza at my local pizzeria rather than an entire pizza.

And while the A100 provides tons of computing power in a single package, it’s a resource that you want to ensure is used effectively. Going back to the pizza analogy, there’s no point in getting half a pizza when you only want to eat a single slice. MIG provides a programmatic means to partition the device for users so that they get a big enough slice to satisfy their compute hunger. The MIG Partition Editor is a tool which can be used to manage MIG partitions. However, creating MIG partitions is not automatically handled by a job scheduler. It is a manual process that must be undertaken by an administrator. Just as in the real world, where one size doesn’t fit all, jobs arriving in an HPC cluster come in all shapes and sizes. Certainly having an administrator on call to tune MIG partitions day and night is not a reasonable solution. Ideally, you’d want to right-size the MIG partitions according to the incoming jobs being submitted to the HPC cluster in a seamless manner.

a diagram of IBM Spectrum LSF Suite: at the innermost layer, the suite for workgroups, there is simplified installation & deployment, an application-centric user portal, and lightweight reporting & dashboards. the middle layer, Suite for HPC, has intelligent data staging, workflow automation and hybrid cloud auto scaling. the outermost layer, the suite for enterprises has software license optimization and enterprise scalability.
IBM Spectrum LSF Suites: complete workload management for demanding HPC environments

Dynamically speaking

With more than 28 years of HPC scheduling expertise under its belt, IBM Spectrum LSF has supported scheduling to GPU resources since 2007. IBM Spectrum LSF now supports dynamic or static scheduling to NVIDIA MIG slices, extending IBM’s long tradition of enhancements to NVIDIA GPU support. In the case where a static MIG configuration has been configured by an administrator, Spectrum LSF users can specify the number of MIG slices, or the amount of GPU memory required for their job. With dynamic MIG support enabled in Spectrum LSF, MIG slices are automatically created based upon the GPU resource requirement specified at job submission time. Users can request a number of GPUs and the amount of GPU memory required. Spectrum LSF will then dynamically create a MIG slice to accommodate the job. The dynamic MIG support in Spectrum LSF helps to ensure that GPUs are right-sized for the jobs, helping to eliminate waste of GPU resources.

Let’s take a quick look at the LSF job submission syntax for a GPU. The Spectrum LSF bsub command is used for submitting a workload to LSF, including GPU workloads. It accepts the -gpu flag which is used to specify GPU requirements including the number of GPUs required, the GPU mode, the GPU model, and the amount of memory. To enable dynamic MIG scheduling in Spectrum LSF, the option LSF_MANAGE_MIG=Y must be set in lsf.conf. Let’s take a simple example into consideration. A user requires a single GPU with 32 GB memory for their job. For this, the user would issue the following command:

bsub -gpu:num=1:gmem=32G” ./gpu_command. 

The target GPU (an NVIDIA A100 with 40GB of memory) would be automatically configured by Spectrum LSF to have a single MIG instance in order to satisfy the request. This configuration would be done automatically and is seamless to the user.

In this brief article, we’ve only just scratched the surface of the GPU scheduling capabilities in IBM Spectrum LSF. Dynamic MIG support is the latest in a long list of NVIDIA GPU support in IBM Spectrum LSF including:

  • Automatic detection and configuration of GPUs
  • Automatic switching of GPU mode
  • Detailed GPU job accounting leveraging NVIDIA DCGM
  • Automatic creation of Linux control groups for GPU jobs
  • Fairshare and preemption scheduling support for GPU jobs
  • NVIDIA MPS support

By treating NVIDIA GPUs as a first class citizen, IBM Spectrum LSF helps organizations to orchestrate workloads effectively across a heterogeneous infrastructure, on premise and in the cloud. Learn more about dynamic MIG support in IBM Spectrum LSF in this NVIDIA GTC21 session recording: Maximizing Capacity: Workload Driven Dynamic Reconfiguration of NVIDIA MIG.

--

--

Gábor Samu
IBM Data Science in Practice

Senior Product Manager at IBM specialized in Spectrum Computing products. Over 20 years experience in high performance computing technology. Retro computing fan