How do I love my HPC, let me count the ways

Gábor Samu
IBM Data Science in Practice
4 min readJan 29, 2021
person with a tablet walking through a row of servers

A high performance computing (HPC) match made in heaven often comes down to far more than vital stats. We can all be smitten by core counts, Petabytes, accelerators and that feeling of being on cloud nine. Back on earth, practically is what counts. For end users, HPC environments must be easy to get along with, while delivering results.

Depending on who you ask, there are many keys to a good working relationship. So, what are the keys to keeping your HPC relationship working?

Keeping it simple (KISS)

In the early days of HPC, the lines between administrators and users were blurred. Out of necessity and besides their day jobs, users frequently had to stand in as IT specialists or system administrators. Users today more than ever need to stay focused on their role as scientist, engineer and not IT. They need ways to submit their work and get back the results in as simple a way as possible. As HPC environments grow in scale and complexity, adhering to the KISS principle is more important than ever.

Embracing accelerators

It’s no secret that GPUs are the de facto way to spice up HPC performance. much like the right balance of spices and seasoning can make or break a meal. Too much and we spoil the recipe, leaving you running for water and too little can leave you underwhelmed. In much the same way, it’s important that GPU accelerators in your HPC environment are used in a balanced and effective way for fantastic results.

Floating on a cloud

We’ve all felt it. The feeling of floating on a cloud as we become enamoured with someone. The massive computing capability that is available on tap in today’s public clouds is very attractive. Organizations need this computing capability to dream up world-changing products. Back on terra firma, organizations know all too well that cloud comes at a cost. Given this, they are looking for ways to stay grounded in their use of cloud, while enjoying the benefits it provides.

Smarts

Have you ever found yourself completing the sentence of your partner? Done right, this is normally a good sign that you’re in sync with one another. This meshing of thoughts and ideas means that you’re on the same wavelength and are able to anticipate what makes things tick. As HPC environments grow, being able to understand and anticipate the demands of users as work gets placed is crucial. Often times, HPC users exhibit patterns, including bad ones, which can have a negative impact on work throughput. With a myriad of variables in an HPC environment, organizations are seeking ways provide a helping hand to users. By looking at user patterns, they are exploring ways to help drive efficiencies in the environment for faster results.

Match made in heaven

With over 25 years of experience in HPC workload management, Spectrum LSF Suite makes the perfect companion to your HPC. Spectrum LSF makes it easy to submit, manage, and monitor jobs via a web browser, desktop & mobile clients, and RESTful APIs.

Learn more in this video — Simplifying HPC

Spectrum LSF introduced support for NVIDIA GPUs way back in 2008. Since then, IBM has worked with NVIDIA to enhance this support to help ensure that organizations can get the most out of these devices. Recently, IBM has introduced support for the multi-instance GPU (MIG) capabilities of the NVIDIA A100. This allows Spectrum LSF to dynamically reconfigure MIG to rightsize match the requirements of the workload for better GPU ROI. Additionally, this frees the administrator from the manual (and potentially tedious) task of reconfiguring MIG as workload demands change. Learn more about this exciting capability here.

Hey, you, get onto my cloud! Hardly a day goes by without seeing an announcement about organization X using cloud provider Y for HPC. In reality, many organizations have adopted a multi-cloud approach to augment on-prem resources. On the surface, not putting all of your eggs in one cloud provider’s basket means that you don’t have a single point of failure. Differing workload schedulers and interfaces between clouds though can turn this into an administrative nightmare. The Spectrum LSF resource connector enables organizations to intelligently burst workload from on-prem to any one of the supported cloud providers. By intelligently using the cloud, Spectrum LSF helps to control costs by paying for only what you really need.

As the saying goes, hindsight is 20/20 and what if we could apply this to HPC? Well we’ve done just that with the Spectrum LSF Predictor. Spectrum LSF Predictor enables the creation of AI models based upon historical job information to help maximize cluster utilization by minimizing unused or idle resources resulting from inaccurate user requests. So, this can act as an invisible guiding hand that uses hindsight and learns to mitigate user error. This recent blog post gives you the lowdown on Spectrum LSF Predictor.

screenshot showing a leaderboard, two bar graphs, and a chart
IBM Spectrum LSF Predictor dashboard

That’s a quick look at some of the exciting capabilities of Spectrum LSF and how it can be your lifelong partner for HPC. Spectrum LSF — the match made in heaven for HPC users. Learn more here.

--

--

Gábor Samu
IBM Data Science in Practice

Senior Product Manager at IBM specialized in Spectrum Computing products. Over 20 years experience in high performance computing technology. Retro computing fan