How Sunlight reinvented the HCI stack to run on ARM processors

Published in

sunlight.io

5 min readMay 14, 2021

An interview with Sunlight’s founder and CEO, Julian Chesterfield

Image showing ARM processor on development board. Photo taken by Vishnu Mohanan for Unsplash.

Sunlight was born out of a research project to design and create a next-gen server architecture to run on many low power ARM processors. Key to the project was a collaboration with a number of European partners including Arm — the processor device architects behind the silicon chips used in everything — from smartphones to supercomputers, from medical instruments to agricultural sensors, and from base stations to servers. The project was tasked with designing a scale-out architecture server that leveraged low cost, densely connected ARM processors with a concept of memory sharing across a fast, cache coherent memory interconnect. The challenge for the Sunlight team was to build a lightweight clustered hypervisor that could run efficiently and at large scale across the processors, while sharing centralised storage and network resources.

What we didn’t realise when we began the project was that we were actually creating a whole new hyperconverged infrastructure (HCI) architecture that was perfect for use cases at the Edge! The smallest footprint imaginable with baremetal performance and remote manageability turned out to be really important for remote edge installations.

Why is a new HCI server architecture needed?

Existing hyperconverged architecture systems were well established in the datacenter but were too ‘heavy’ and ‘bloated’ to operate on less powerful processors, such as embedded Arm SoCs that were being used only in mobile phones at the time. Mobile device processors vary in capability, however they typically have as few as 8 cores (in a big.LITTLE architecture) and as little as 4GB of physical memory per socket. The low cost of such embedded mobile device processors meant that it was economically feasible to build very dense, multi-processor systems that ran independently from each other, meaning that you could build out a farm of processors in a very affordable way. The challenge however was to make the system resources as efficient as possible in order to leave enough resources to do useful work in parallel to the system management tasks.

No off-the-shelf virtualisation stack was capable of driving this hardware efficiently, so we had to design a system that could. The target architecture provided multiple sockets on a resource constrained compute node, where each one had to drive at least one NVMe device — that’s upwards of half a million IOPS per node, using limited memory and core resources. On a conventional HCI stack, over half of the system would have to be reserved for the control plane, leaving little for compute performance. We needed to improve efficiency by optimising the stack and pushing the IO processing further down into the VMM layer of the stack. We achieved this by inventing a way to centralise the resource management of all of the compute nodes.

How did the Sunlight HCI stack end up running on Samsung Galaxy mobile device processors?

A subsequent partner in the project referenced above named, Kaleao systems (now known as Bamboo Systems) — developed a server with a similar architecture using 8 core Samsung Exynos ARM processors (which were widely deployed in Galaxy phones), with 4 CPUs per compute board, 4 compute boards per blade, and 12 blades per 2U server. This offered a significant amount of density in terms of processor cores and compute capacity with up to 1500 cores, 1.5TB of RAM and 24 Million storage IOPs per 2U chassis!

Driven by this requirement to operate on a small footprint, these workloads had to be managed in a centralised way. We couldn’t have individual control planes running on every node with a complete stack managing every aspect of a full web-based UI. It just didn’t make sense.

We ripped out all of the typical control components that you’d have in a virtualisation stack to make it lightweight enough to run across hundreds of tiny nodes. This meant we had to centralise all of the localised processing logic for managing workloads and minimise the amount of operational tasks we pushed to the compute node.

Our model therefore was to create a clustered hypervisor with a micro ethernet layer-2 based control interface on each cache-coherent compute instance with management centralised across all of the compute instances. It was incredibly efficient and could drive close to 100% performance for workloads running on each Arm-based compute node.

What does this mean for use cases at the Edge?

We realised we had developed a virtualisation stack that had a unique performance profile. Perfect for small, rugged and resource-constrained environments, like edge devices. And that’s how our initial prototype design for Sunlight was formed.

Despite the performance advantages and the ability to run workloads as close to baremetal performance as possible, initially there was no market for Sunlight on Arm. Arm servers weren’t being used in the datacenter, Arm hardware was manufacturer and project driven, and there was no general purpose compute market at that stage. However, we found the same exceptional results in more conventional commodity hardware, like Intel and AMD.

Source: ESG, a division of TechTarget, Technical Review, *Solving Hyperconvergence (HCI) at the Edge with Sunlight*, May 2021. Download full report.

Wind forward a few years and advances in technology have led to an explosion in use cases requiring data processing close to the source and often in harsh and hazardous environments — in other words, at the Edge. Applications such as Data Analytics, the Internet of Things, Artificial Intelligence, Machine Learning and Smart Cities are changing the world around us — whether for working from home, self-service store check outs, catching criminals, monitoring remote oil rigs, factory automation, maintaining crop yields or mapping the human genome. And this is only accelerating.

These use cases require reliable and real-time data processing close to the source, with high performance and simple management of all nodes and deployments. Conventional virtualisation stacks just don’t cut it. That’s why Sunlight’s edge-to-core hybrid infrastructure platform is best placed to support the processing and management of these application workloads running at the Edge.

Find out more about Sunlight’s solution for the Edge.

How Sunlight reinvented the HCI stack to run on ARM processors

Written by Hannah Mellow