By Soumik Sinharoy
We live in an on-demand society. Urbanites use ride-hailing apps to get around their cities; entrepreneurs use co-working spaces to quickly scale their office needs; and enterprise companies use cloud services to scale with their online storage needs. No one should be surprised that there are opportunities for on-demand supercomputing power as well.
Having worked with datacenter folks over years, I have seen the pain that operational teams have to go through every time there is a change request from the end users, especially when it comes to GPU-centric computing platforms. Whenever there would be a need to add or remove a few GPUs, the only way it would be possible would be to shut down the entire system, take it out of the rack, open the server, add or remove GPUs, test the system for integrity, and then bring it back online for users. This unavoidable procedure causes unpleasant downtime for users, some of whom would be responsible for managing mission-critical workloads.
I am always on the lookout for technology that allows us to add or remove datacenter resources — GPU, SSD, network interfaces, etc., at the click of a button and allows us to create new configurations at the hardware level — especially on the fly. The key hurdle that many companies try to overcome is: How do you extend the PCIe fabric from the host level to the rack (and eventually datacenter) level? That would allow any host to access resources from any other host as if they are locally managed resource. If and when such a capability is provided, it would be a complete game-changer for IT.
That’s why I was proud to work with a team from Orange Silicon Valley and Liqid to deliver a prototype for a fully composable GPU supercomputing platform, which was on display this week at Supercomputing 2017 in Denver.
The prototype provides a model to show how users requiring GPU-centric compute can access composable platform services. That’s all thanks to near-unlimited GPU capacity for the data center’s most computationally demanding applications. Those potential applications may include artificial intelligence (AI) and machine learning, virtual and augmented reality (VR/AR), video rendering, DevOps, scientific discovery, and other high-value applications that we haven’t even considered.
In a world of evolving computing needs across internet-connected devices and apps, new use cases are emerging every day. And many of them require flexible needs that composable GPU solutions are better equipped to handle than traditional enterprise options.
We looked at this issue of increasing demand for scalability and flexibility for GPU-centric infrastructure. In response, we envisioned a new way to introduce agility and composability by extending the low latency PCIe fabric. Advanced GPU supercomputing platforms such as these can enable us to meet the demands we find from emerging applications. As those applications benefit from GPU scale out, the possibilities become enticing. As a result, we’ll see a much wider variety of vertical market segments to take advantage of this powerful, increasingly multi-purpose resource. That’s exciting for all of us.
Moreover, thanks to our work on this project, Orange Silicon Valley’s collaboration with Liqid enabled a list of industry firsts. Those including all of the following:
- Adaptive GPU Supercomputer for artificial intelligence
- GPU scale out — clustering dozens of GPUs across PCIe
- Dynamic, bare-metal orchestration of GPUs to CPU in real time
- Hot-swappable GPU in Linux and Windows
- Peer-to-peer GPU communication via rack-level PCIe fabric
Hitting these milestones provided validation for our work together, so it was a pleasure to get the opportunity to share our results with the public at Supercomputing 2017. The most interesting achievement during our experiments with Liqid was that we could make peer-to-peer (P2P) communication work between GPUs in different physical hosts over an external PCIe fabric, which was (magically) managed and orchestrated by the Liqid OS and software stack. This means that in the case of training deep neural networks, we can scale to hundreds of GPUs and make them work together as a single system with extremely low latency for near linear scalability of DNN training workloads over massive volumes of training data. Actually, we can use consumer gaming GPUs with P2P capability and scale to petaflops of computation managed by a single host.
In my previous work, Orange Silicon Valley has deployed similar resources in its own infrastructure to overcome issues with CPU performance bottlenecks and enable us to perform multi-dimensional queries on massive unstructured datasets at speeds far faster than traditional high-performance computing configurations would allow. While going through that process, Orange Silicon Valley reduced its infrastructure costs significantly. We were limited by the number of GPUs we could pack into a single system — a total of 20, which was supported by the CocoLink system. Now, with access to external PCIe Fabric managed by Liqid OS, we can break that 20GPU barrier and build large-scale supercomputing nodes and dramatically lower cost with consumer GPUs.
Expanding our work out into the emerging world of composable GPU solutions took our expertise into a new realm where we were able to find success and learn from our partners at Liqid.
If you’re interested in reading more, I recommend reading Liqid’s full announcement, as well as Dean Takahashi’s article about us at VentureBeat. And as always, watch this Medium account and the blog at OrangeSV.com to keep up with all of our latest work.
Disclaimer: The views and opinions expressed in this article belong to the author and do not necessarily reflect the position or views of Orange or Orange Silicon Valley.