RDMA RoCE v2 IP by Grovf | How is it Unique?

Astghik Nalchajyan
Published in
5 min readJan 12, 2022


In the digital age, the need for a quicker, more efficient, and scalable network has never been more vital. Traditional TCP/IP Ethernet connections, which are CPU intensive and involve additional data processing and copying, are no longer able to fulfill modern network traffic. In this context, the world’s most powerful supercomputers and data centers use transformative networking technologies like RDMA over Converged Ethernet (RoCE v2).

What is RDMA?

Remote direct memory access (RDMA) is an outstanding technology that lies at the core of the world’s powerful supercomputers and biggest data centers. In a nutshell, RDMA is a remote memory-management tool that permits server-to-server dataflow directly across application memories without involving the CPU or affecting either machine’s operating system. Offloading data movement from the CPU improves performance and ensures more efficient data transmission, allowing for faster data transfer or higher throughput.

With InfiniBand, RDMA became extensively embraced in the High-Performance Computing (HPC) market, but RDMA over Converged Ethernet is now employed by cloud, storage, and business Ethernet networks with RDMA over Converged Ethernet (RoCE). Then what is RoCE, you would think? RoCE, as the name implies, is a network protocol specified in the InfiniBand Trade Association (IBTA) standard that enables remote direct memory access (RDMA) over an Ethernet network.

Put it briefly, it’s the use of RDMA technology in hyper-converged data centers, cloud, storage, and virtualized systems, combining all of the benefits of RDMA with the familiarity of Ethernet.

What offers GROVF with RDMA RoCE v2 IP core?

As mentioned earlier, RDMA allows for more direct and efficient data transfer into and out of a server by implementing a transport protocol in each communication device’s network interface card (NIC). Two networked computers, for example, might be equipped with NICs that support the RDMA over Converged Ethernet (RoCE) protocol, allowing them to communicate via RoCE.

At the end of 2021, GROVF released RDMA RoCE V2 FPGA IP Core to democratize the RNIC market. It offered RDMA over Converged Ethernet (RoCE v2) system implementation and integration with standard Verbs API. Although there are various solutions in the industry offering RDMA capabilities, GROVF RDMA RoCE v2 IP core, and host drivers guarantee shorter latency and greater data throughput across all message sizes.

It is noteworthy that this IP core was game-changing for FPGA-based NICs that are indispensable in next-generation networks. With their unmatched scalability, FPGA-based SmartNICs allow communication service providers to effortlessly manage huge numbers of devices without substantially increasing latency or power consumption.

Thus, with low latency and high bandwidth in mind, GROVF developed a solution built on FPGA that does not compromise the latency and throughput compared to industry-leading ASIC-based RNIC producers.

Grovf RDMA RoCE v2 IP, when combined with FPGA in-line offload and acceleration capabilities, provokes applications such as HPC application offload, storage clustering and disaggregation offload, algorithmic trading, database memory pooling, and more.

High-Performance computing

Although high-performance computing (HPC) was once reserved for large enterprises with the wherewithal to build this advanced hardware, its capacity to analyze huge volumes of unstructured data and generate significant business insights have made it appealing to a wide variety of sectors.

As more computer processes migrate to cloud platforms and software systems become more standardized, businesses are seeking methods to include HPC into their data operations.

RDMA has a wide range of applications in high-performance computing (HPC), providing great performance and scalability, ultra-low latency, and little CPU overhead of compute- and data-intensive application data transfers.

Grovf RoCE V2 networking card provides exceptionally low latency (2-microsecond roundtrip) and high bandwidth (100Gbs), as well as connection with MPI’s standard Verbs API, on top of which MPI libs may be used to construct HPC AI and Big data processing applications.

Storage Disaggregation and clustering

Storage and computing disaggregation has become a widespread practice in many data centers and clouds. Disaggregation makes it simple to manage and scale both the storage and computing pools. Moreover, it consolidates storage resources and lowers their costs by allowing the storage pool to be shared among applications and users.

In disaggregated storage, the performance of local storage is paired with the flexibility of storage area networks. As the importance of data access performance and latency rises, modern storage clusters are evolving to match 100Gbps network infrastructure.

High-performance connections between storage nodes are essential for constructing a disaggregated storage cluster system. According to this definition, RoCE V2 is a technology that permits data flow across servers while also enabling flexibility and performance at scale.

Database node scaling and memory pooling

More databases and data stores, as well as the applications that operate on top of them, are shifting to in-memory processing, and sometimes the memory capacity is insufficient, and the latencies across a cluster of smaller nodes are too high for adequate performance.

By bypassing the remote CPUs and utilizing the RDMA functions in network interface cards, we can accelerate the processing of in-memory datasets frequently associated with applications like databases and establish a decentralized memory network that can be harvested when needed.

A low-latency RDMA network card is an excellent option for memory pooling design in line with modern memory coherent extension technologies like CXL and CCIX. Low latency and high bandwidth data exchange technology may be provided by an RDMA network card connected to the CPU’s memory coherent buses on the one hand, and RoCE V2 extension on the other, allowing the construction of memory pools natively.

Financial Trading

The financial services sector has recently been buzzing about cloud migration. In the high-frequency trading market, where every second matters, RDMA’s excellent performance, and low latency make it particularly prominent among the world’s best trading organizations.

Final remarks

Opening new perspectives for FPGA-based RNIC makers, GROVF RDMA IP is presented with a reference design that contains the IP subsystem itself, the 100G MAC IP subsystem, DMA subsystem, host drivers, and a sample application on software. The system drivers are compatible with well-known RNIC cards and applications and are integrated with the OFED standard Verbs API. It also includes a low-latency FPGA implementation of RoCE v2 with a throughput of 100 Gbps.

Running RDMA in data centers allows for offloading of data transport and increased CPU resources available to the application. RoCE users may make use of RDMA’s features without having to change their network architecture. RoCE improves speed in search, storage, database, financial, and high transaction rate applications by lowering Ethernet network latency and offloading CPU overhead.