Modernization from CUDA to IA for AI — Shanghai Jiao Tong University’s Paradigm Shift with New Supercomputer Pi 2.0

Ken Strandberg
Nov 6 · 5 min read

One of China’s most important national schools of learning, Shanghai Jiao Tong University, brought their next-generation supercomputer online earlier this year. Built with Intel Xeon Gold 6248 processors, the new system, called Pi 2.0, will advance their researchers’ work across the sciences and enhance implementations in artificial intelligence (AI) and machine learning (ML).

“At the University and HPC center, we consider ourselves as solution innovators,” stated Dr. James Lin, Vice Director of Shanghai Jiao Tong University’s Center for High Performance Computing. “We are very open to be early adopters, always trying to use new technologies that are quite promising for our users’ applications.”

Pi 2.0 augments the University’s previous system, p, a 260-teraFLOPS heterogeneous system built on Intel Xeon E5 processors, NVIDIA cards, and InfiniBand Architecture interconnect. It was installed in 2013 and has served students and professors for the last six years. p gave computational scientists a platform for running codes on both traditional and GPU architectures. But with Pi2.0, built by Inspur, the Center for High Performance Computing has made a paradigm shift, opting to run only on the latest Intel architecture.

“Things have changed a lot over the last six years,” stated Lin. “As research at the university has addressed ever more complex and deeper problems and included new fields in machine learning and big data, more students have needed computing cycles that have not been available on our current machine. The queues for researchers’ jobs have gotten longer and longer, delaying important research work.”

Pi 2.0 is a 658-node system with two-socket Inspur servers and a total of 26,320 compute cores including 658 nodes of 2nd Generation Intel Xeon Scalable processors. It is the largest computing cluster in China’s university system and one of the two fastest. The compute nodes are connected by an Intel Omni-Path Architecture fabric and supported by a Lustre scalable, parallel file system using Intel SSD Data Center series drives.

Successfully Supporting a Paradigm Shift in Parallel Programming

Besides needing more capacity to address long user queues, Lin and Stephen Wang, head of the Center for High Performance Computing’s technical support department, have seen changes in application needs that helped guide the type of new supercomputer they would deploy.

“With traditional HPC applications, like CFD, molecular dynamics, and bioinformatics, and in big data programs, developers increasingly are making use of AI,” added Wang. “Some are using AI to find new materials, life science to detect cancers and other health conditions. Others just want to use more scalable codes on many, many more cores.”

But making an architectural jump that will impact existing codes previously designed to run the CUDA model is no small change for the university’s computational scientists and researchers, especially for those who have been running codes on p’s GPUs. For programmers who have built their applications for CUDA, making the jump to Message Passaging Interface (MPI) and OpenMP for large, multi-core, distributed machines like Pi 2.0, means porting code. And knowing when to use these two models for shared-memory computing and distributed computing can mean providing important support to users. Stephen Wang says they are ready to help users make the paradigm shift.

“We have experience doing modernization from CUDA to IA,” said Wang. “So, if users want help, we can assist them to port their code, for example, biology codes, from GPUs to the new machine. Even for some large inhouse codes with high scalability, we can help them port to the new machine.”

Wang’s department provides multiple levels of porting support. Users can send their questions by email, such as on compiling, MPI, OpenMP, etc. Or, for other users, especially those with large inhouse codes, Wang’s team can access their code and provide hands-on porting support if given appropriate permissions. The support group helps researchers port and optimize both open source codes and home-grown applications. Commercial software is typically run with the vendor’s latest updates for the new system.

Big users with high scalability codes will be among the first to run their work on Pi 2.0. To begin with, some nodes will be set aside for them to port their existing codes. Many of these are traditional HPC applications, such as an inhouse high-scalability particle transportation, acceleration, and radiation code used for research on lasers. Another is a large nbody astronomical code.

The new machine will not only offer university researchers a level of scalability they’ve never had, with greater capacity, they will be able to enhance their research by taking advantage of new technologies in the architecture.

“Users have not been able to test and implement optimizations on the existing platform,” said Lin “because it’s been so busy. AI users have wanted to test their enhanced codes, but there was a long queue. The new machine is seven times larger than the current system, giving them more capacity to test and run optimized codes. Plus, p is six years old; it lacks features found in the latest CPUs. Pi 2.0 has new AI and ML technologies for accelerating deep learning and inferencing that users are excited to experiment with and implement.”

Dealing with Power and Storage

While Pi2.0 is not a huge machine, power was a critical concern for the Center for High Performance Computing.

“We are required to support a power usage effectiveness (PUE) of 1.3,” commented Wang. “With 26,320 cores, p 2.0 will be seven to eight times larger than p. But, the more efficient technology of the processors means Pi 2.0 power demand will only be between two and three times that of its predecessor,” concluded Wang.

The Lustre filesystem was another key area of concern. Shanghai Jiao Tong University hosts the Thousand Crop Genome Project. Genomics researchers are a big group of users of the Center for High Performance Computing’s supercomputers. Lustre was historically designed for serving very large datasets, while genomics assembly and analysis runs many jobs that make small data requests — as many as 1000 at a time. With the number of genomics jobs being run on the rise, Lustre was becoming a bottleneck. With the new supercomputer, the Lustre filesystem includes SSD drives from Intel to accelerate IO across the storage cluster.

With Pi2.0, Shanghai Jiao Tong University boasts the largest supercomputer in China’s university system. With seven times the computing capacity of its previous supercomputer, researchers will be able to expand important work in sciences and computational technologies. And, while the new system introduces a programming paradigm shift along with capacity, the Center for High Performance Computing is prepared for the growing processes for users to adapt from GPUs to IA.

For more information read the case study.

Written by

Ken Strandberg is a technical story teller. He writes articles, white papers, seminars, web-based training, video and animation scripts, and technical marketing

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade