The Fastest PCRE Compatible Regular Expression IP Core on Xilinx® Alveo™ Accelerator Card

INTRODUCTION

Fast analysis of unstructured textual data, such as system logs, network traffic, social media posts, emails, or news articles, is growing ever more important in technical and business data analytics applications. Nearly 85% of business data is in the form of unstructured textual logs. Rapidly extracting information from these text sources is critical for business decision making. GRegeX is an implementation of standard regular expression algorithm on FPGA chip achieving 12.8 GB/s throughput with a single IP core. Wide range of supported regular expression functions allows developers configure desired rules which can be handled in a chip without reducing the throughput.

SOLUTION OVERVIEW

The solution consists of two parts: Regular Expression IP core on the FPGA side and the drivers in Host side: The data sources of the solution can be the NIC of the server using Linux Kernel or DPDK library, the network interface available directly on the acceleration card or any application running on the Linux environment for feeding the GRegeX Drivers with the data.

FPGA acceleration architecture

SOLUTION DETAILS

The design occupies about 1/3 of the Xilinx Alveo U200 FPGA card resources making it available for further scaling and achieving greater throughput to support more than one 100G network interfaces with the single chip.

Specification of GRegeX:

GRegeX specification

The real footprint on the FPGA chip:

Design footprint on FPGA chip

Configuration file allows developers to customize the settings of the design and tune it for specific workloads at the expense of FPGA resource utilization. Host driver has embedded functionality to check the entered regular expression rules and determine any errors in the rules.

Current design can find the sophisticated, variable length regular expression patterns including: +, * (Kleene operations), |, ?(alternate operation), () -groups and [a..z] (character classes). The design also resolves collisions which arises if two neighboring regex rules have overlapped matching.

KEY BENEFITS

  1. 12.8 GB/s throughput with a single core
  2. PCRE compatible
  3. Customizable
  4. Host drivers and reference examples for using in C and Java
  5. Supports cloud as well as on-premises cards

RESULTS

GRegeX achieves 12.8 GB/s throughput regardless of the regular expression rule set while software implementation speed decreases when using more complex regex rules such as brackets and repeat symbols.

Comparison with software implementation

Note: Results shown above are for Xilinx® Alveo™ U200 card

Regular expressions are commonly used functions in the applications such us: DNA analysis, content extraction, packet inspection, security and log text analysis and many more. The nature of the regex algorithm does not allow to parallel its components in a way that powerful GPUs or multi-core processors can benefit. Meanwhile, flexibility of FPGAs in sense of parallelism, pipelining, memory distribution architecture with a well designed algorithm solves text processing problems and turns FPGAs into irreplaceable device for this kind of applications.

Learn more about the product by visiting Xilinx®

Learn more about Grovf, Inc.

Learn more about Xilinx Alveo Accelerator Cards

Artavazd Khachatryan

Written by

grovf

grovf

Grovf is an application performance acceleration company through FPGA-CPU pairs, focusing on the development of basic programming algorithms on FPGA and creating the universal offloading platform in the application layer.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade