Beeks Maximizes Storage Efficiency with Advanced Compression Techniques
Historical Approach
Beeks Analytics has traditionally utilized the AHA-37x range of cards to compress and decompress capture data during disk writes. These dedicated GZIP compression cards worked seamlessly with any disk type, providing a cost-effective solution that significantly increased storage capacity. An added benefit of compression was the reduction in I/O bandwidth, enabling support for higher input data rates.
Product deprecation
However, in 2021 the vendor deprecated the AHA-37xx product line, leaving us to look around for alternative solutions.
We needed to find a solution that could come close to achieving our goals. Our aim was to achieve 100 Gb/s compression/decompression throughput at a compression rate of around 66% while keeping hardware costs to a minimum. This would allow us to handle any current data stream in the industry. For comparison, our existing AHA-37xx solution would require 3 cards to achieve this goal.
Other factors that we needed to consider with any card-based solution were the card size and power consumption. We would prefer half-height cards, and to keep power consumption as low as possible (the AHA cards drew less than 50W).
Potential Solutions
We evaluated several fresh solutions, weighing their advantages and disadvantages. Our aim was to achieve 100 Gb/s as this would allow us to handle any current data stream in the industry:
Software GZIP
Using the CPU to compress data with the gzip compression algorithm is a cheap solution and easy to implement but is not efficient enough to do the job. It could be used in systems with a low data throughput but for the majority of our use cases would not be sufficient.
A CPU running at 100% usage is not efficient as it stresses the thermal management of the server and causes performance throttling. CPU SW based compression uses a significant amount of memory bandwidth which can affect all processes running on the server.
It is still useful to have a software GZIP solution available on the system to fallback to temporarily in the event of hardware failure.
NVMe disks
Although not a compression option, NVMe disks had reached a price-point whereby it might be possible to use these and benefit from their improved I/O speeds.
We carried out some tests on the newest NVMe disks and found that they could handle a high data throughput although not up to 100Gb/s. However, the limit here would be more around capacity as we would not be getting the benefits of compression.
The implications of this are that in some scenarios, we could utilise NVMe only systems without compression where the requirement to maintain a PCAP store is limited.
GPU-based compression
This looked like a viable candidate and would potentially be a cost-effective solution as GPU cards are commonly available and existing open-source compression code for GPU’s was available.
We settled on experimenting with NVIDIA cards using the nvcomp library and ran a series of tests against using an A2 card. Although we got the compression/decompression working with this card and libraries, we found the following issues:
- On the original AHA cards, we had separate compression and decompression cores. With nvcomp, resources are shared, meaning if we maxed out the card with hardware compression, any hardware decompression would affect performance.
- The rates of compression on the GPU card we achieved were not sufficient for our requirements. We would have had to use a significantly more powerful card than the A2, with a significant increase in cost. And to achieve our 100 Gb/s goal, we would have needed more than 1 GPU card.
- The available GPU cards were all full height which would impact our server hardware architecture plus would require 150W+ of power.
- With no load-balancing available, we would have had to invest time and effort to implement this for use in a commercial product offering.
It was fairly obvious that the GPU approach is possible, but would come at a significant cost and require significant infrastructure changes. This set of drawbacks was such that we moved on to looking at options for an FPGA based solution.
FPGA-based compression
Initially we had not considered this option due to the requirement for specialist programming expertise but knew that as a solution it would be fast enough to do the job we wanted. Once we had discarded the GPU approach, we carried out more in-depth research into options in this space and eventually partnered with Eideticom to use their NoLoad solution to deliver the compression we required.
We used an Alveo U50 card in our initial development work. This card fitted our physical requirements of being half-height and utilising a maximum power of 75W, so is already a considerable improvement over the GPU card approach.
We then implemented compression using Eideticom’s NoLoad engine initially configured with 6 compression accelerators and 8 decompression accelerators. Incorporating the compression was done by integrating our software with Eideticom’s provided userspace APIs (libnoload). Eideticom’s NoLoad API’s provided a flexible approach to implementation, allowing us to build a solution without needing extensive FPGA knowledge, and any queries we had were quickly resolved by their development team. After extensive testing, we found that this initial configuration maxed out at ~45.9Gb/s compression and ~72.5Gb/s decompression for our test datasets. Without much optimization this seemed to be on-target for our requirements, so we continued with further development of the FPGA solution and focused our efforts on achieving our required results.
Our testing also found that this the most power efficient compression implementation, using less than 30W which is normally the consumption of a set of fans in the server.
Results Summary
We continue to work with Eideticom to improve the solution for our requirements, but a brief summary of results that we have obtained to date is as follows
- Compression maximum throughput on a single card peaks at around 50.5 Gb/s
- Decompression maximum throughput on a single card peaks at around 55.7 Gb/s
These values can be modified for specific use cases by adjusting the ratio of compression/decompression cores on the FPGA, meaning we can also adapt the solution in firmware to suit the expected load.
The implication of these results is that for the majority of use cases, a single card for compression/decompression will suffice, but more processing power can be added when necessary to scale the performance.
For use cases such as handling OPRA, where peaks approaching 100Gbps are possible, a 2 card solution would be required. This is an improvement on our previous AHA-37xx solution which would have required 3 cards. NoLoad provides load balancing capabilities across multiple cards making this an optimum solution for our requirements.
For improving disk space capacity, we have the compression rate set to achieve around 63% compression. This implies we can store at least 3x as much data on the same size disk as we would have been able to without compression. This is equivalent to what we achieved with the AHA-37x cards and can be improved further by adjusting compression rate settings.
We started our development using the Alveo U50 card, an FPGA-based dedicated accelerator card available in a half-height form factor from AMD. We have also tested using the Bittware IA-440i card which is available in both half-height and full-height form factor. The fact that we can use multiple vendors for these cards is excellent, as it means we can minimise lock-in to a specific hardware manufacturer.
In conclusion, we have found using FPGA-compression with NoLoad to be an excellent solution to the AHA-37xx deprecation problem. In addition, having an FPGA card as part of our solution opens up a range of opportunities for future improvements — keep your eye on the Beeks Technology Blog and Beeks website for further updates.