The Fastest PCRE Compatible Regular Expression IP Core on Xilinx® Alveo™ Accelerator Card
Fast analysis of unstructured textual data, such as system logs, network traffic, social media posts, emails, or news articles, is growing ever more important in technical and business data analytics applications. Nearly 85% of business data is in the form of unstructured textual logs. Rapidly extracting information from these text sources is critical for business decision making. GRegeX is an implementation of standard regular expression algorithm on FPGA chip achieving 12.8 GB/s throughput with a single IP core. Wide range of supported regular expression functions allows developers configure desired rules which can be handled in a chip without reducing the throughput.
The solution consists of two parts: Regular Expression IP core on the FPGA side and the drivers in Host side: The data sources of the solution can be the NIC of the server using Linux Kernel or DPDK library, the network interface available directly on the acceleration card or any application running on the Linux environment for feeding the GRegeX Drivers with the data.
The design occupies about 1/3 of the Xilinx Alveo U200 FPGA card resources making it available for further scaling and achieving greater throughput to support more than one 100G network interfaces with the single chip.
Specification of GRegeX:
The real footprint on the FPGA chip:
Configuration file allows developers to customize the settings of the design and tune it for specific workloads at the expense of FPGA resource utilization. Host driver has embedded functionality to check the entered regular expression rules and determine any errors in the rules.
Current design can find the sophisticated, variable length regular expression patterns including: +, * (Kleene operations), |, ?(alternate operation), () -groups and [a..z] (character classes). The design also resolves collisions which arises if two neighboring regex rules have overlapped matching.
- 12.8 GB/s throughput with a single core
- PCRE compatible
- Host drivers and reference examples for using in C and Java
- Supports cloud as well as on-premises cards
GRegeX achieves 12.8 GB/s throughput regardless of the regular expression rule set while software implementation speed decreases when using more complex regex rules such as brackets and repeat symbols.
Note: Results shown above are for Xilinx® Alveo™ U200 card
Regular expressions are commonly used functions in the applications such us: DNA analysis, content extraction, packet inspection, security and log text analysis and many more. The nature of the regex algorithm does not allow to parallel its components in a way that powerful GPUs or multi-core processors can benefit. Meanwhile, flexibility of FPGAs in sense of parallelism, pipelining, memory distribution architecture with a well designed algorithm solves text processing problems and turns FPGAs into irreplaceable device for this kind of applications.
Learn more about the product by visiting Xilinx®
Learn more about Grovf, Inc.
Learn more about Xilinx Alveo Accelerator Cards