Heterogeneous computing as a next- generation architecture for scaling data centers: trends, opportunities, solutions

Published in

grovf

6 min readAug 29, 2019

Introduction

Nowadays computing systems used in data-centers face huge challenges which are exposed here. The heterogeneous computing units consisting of CPU and FPGA pairs may just be that winning platform, especially in database acceleration. In data centers, IP developed by Grovf scales computing units vertically (adding more computing power to an existing machine) while preserving their horizontal scaling property (adding more machines into your pool of resources).

The key pain points in the computing industry

There is a new trend in the industry of high-performance computing: to utilize the heterogeneous system architectures consisting of CPU + GPU + FPGA. The CPU performs serial tasks, GPU — mathematical operations (mostly with large sets of matrices) and FPGA is for parallel computing such as for algorithmic operations, low latency math operations and for connectivity with the real world (e.g., network, storage, etc.)

Why now? Two facts answer this question. First, processors’ transistors cannot get smaller (e.g., Intel postponed the 10nm chips), thus lesser opportunity for higher speed, lower energy consumption that small processors provide. Second, data keeps growing bigger due to IoT and other data generation technologies (nowadays every single device / sensor / controller has the Ethernet connectivity and generates time-series data). However, end customer does not care as much about frequencies, performance, and other processor metrics but rather about the cost to achieve those. Thus, the most significant metric driving the industry is cost/performance. Processors alone can no longer drive sufficient cost/performance improvements.

Who suffers from this? Of course, the data center industry because their sole business is to cheaply store, process and supply data in the IoT era of high-performance. The only criterion for the data center efficiency is the Request Per Second Per Watt parameter, which means how many database requests you can handle while spending 1 Watt energy. This is a cornerstone metric for data centers. And it is exactly for this reason that, although GPUs may provide powerful acceleration for the database operation, but because of being extremely power hungry, GPUs are generally not used for this type of acceleration.

Driven by the proliferation of IoT devices, data centers problems are even more exacerbated:

High power consumption
Long delays in response
Large administrative expenses
More space

And data centers face these problems now.

Existing processing power and generated data mismatch

According to the Cisco and IBM predictions, 30–50B IoT devices will be connected in 2020 increasing the existing gap.

FPGAs: the most efficient platform for optimizing parallel computing

Let’s see the comparison of parallel computing platforms GPU, FPGA and ASIC using the following criteria.

Latency — FPGA and ASIC are winners here because they both operate on bare metal, and timing characteristics are defined by the developer at the implementation stage. GPUs are suffering when it comes to processing small amounts of data but nicely scale with higher throughput. This gain of GPUs can be materialized when by accumulating data before processing. However, such a technique adds latencies on the acceleration path.

Power — ASIC has the best power consumption characteristics which are achieved because ASIC can’t be reprogrammed with new firmware unlike FPGA. The next best platform in terms of power consumption is the FPGA. The GPUs are extremely power hungry devices.

Flexibility — In terms of flexibility, GPU is the best because its algorithms can be fully implemented in software. The programming of GPU’s also is simple in comparison with FPGA / ASIC. The FPGAs, on this criterion, are settled in the middle. Although FPGAs can be reprogrammed, the development flow for FPGAs is quite complex. ASIC has the worst flexibility. Once developed, ASIC cannot be changed as it becomes custom chip with no possibility for further modifications. The typical ASIC production takes 1–2 years and costs ~$10M

Conclusion. Taking into consideration all of the aforementioned criteria and applying it to the DB algorithms, yields an unequivocal winner for DB, FPGA. FPGA is the only platform that provides a low latency bare metal chip which has good power consumption characteristics while remaining flexible enough to be reprogrammed to meet the fast-changing needs of the DB industry.

FPGAs for DB acceleration: a quick overview

Is it at all doable? Well, the first publication about how to use FPGAs in data center acceleration was published in 2015, which was when we were already doing our research on this topic. Facebook and Amazon have implemented FPGA acceleration in-house, and the solution is not available for public use. The only one existing solution with FPGA acceleration, to our knowledge, is implemented by Swarm64 for SQL type of databases. Dozens of

companies claim that they have FPGA acceleration solutions for SQL or NoSQL type databases, but the reality is different. At maximum, these solutions are an offload engine for a single simple software function. The reason for lack of fundamental solutions in the DB area is most likely complexity. FPGA schematics implementations for complicated algorithms such as DB algorithms are just way too hard and time-consuming.

The market size of this product is quite impressive as it is now $9b and increases rapidly by 20–30% because of the IoT trend. So you might think about how the world solves this problem as data is huge even now. The answer is simple, data centers physically multiplying the number of their servers and in 2020 they need to multiply it by 8. This is where Grovf comes to disrupt the industry.

Grovf IP and value proposition

After many years of research and development, Grovf’s (patent pending) solution accelerates data flow path from the network layer, through the algorithmic layer, and all the way through the storage connectivity.

At Grovf we are running the database on a new paradigm. The most critical software parts of database algorithms are offloaded on the FPGA layer, ensuring the best performance / Watt metric.

The main components of the Grovf offload engine are:

FPGA-accelerated database engine
PCIe connectivity
TCP connectivity
DRAM connectivity
External memory connectivity

More flexible database functionality is running on the CPU while heavy loaded functions are running on Grovf’s IP implemented for FPGAs.

Database execution on heterogeneous system

This approach eliminates the different layers that exist on the standard servers where data is passing back and forth until finally settled in the memory.

With Grovf’s developed IP, the GrovfDB passes data directly to the hardware layer, where as you guessed it, not the CPU but rather FPGA does all the processing in parallel manner. Only after fully processing the data, the data is sent from the FPGA chip to the CPU.

Conclusions

Old fashioned computing units can no longer provide the required efficiency.
Heterogeneous systems are the reality and the future, providing new business opportunities for old problems.
GrovfDB is running on a heterogeneous system which consists of FPGA / CPU pairs and provides 10X faster DB transaction.
And yes, about that metric, the request per second per watt is improved 30X compared to the CPU.
Thus, Grovf’s solution increases the processing power of data-centers vertically while keeping their horizontally scaling properties.