Who is the winner — Comparing Vector, Fluent Bit, Fluentd performance

Ajay Gupta
IBM Cloud
Published in
5 min readSep 9, 2021

Co-authored with Eran Raichstein

“If you can’t measure it, you can’t improve it.” – Peter Drucker

The quote above is relevant in many situations including log collector performance benchmarking, which is the theme of this article.

Many modern applications are deployed as micro-services in cloud environments like OpenShift or Kubernetes. Organizations are seriously impacted when critical applications are unavailable; it becomes imperative to quickly resolve issues.

Observability is an area that helps us understand the internal workings of a system using Logs, Metrics, and Traces. Logs is a primary data source that can be collected, transformed, and stored for observability. Log collection is expensive in terms of resources like CPU, memory, and storage and it demands more resources in cloud environments where many micro-services are deployed emitting logs constantly.

We will compare the performance of log collectors Fluentd, Fluent Bit, and Vector based on log-collection rate, CPU, and memory.

What are Log Collectors?

As shown in Figure 1. Log collectors consume logs from several sources (e.g. files, system, audit, statsd). Collectors may perform transformations operation, which includes enriching entries with meta-data, parsing, filtering, and many more. These logs then are sent to multiple types of persistent storage.

Figure 1. Log Collector Architecture

Log sources generate logs with different rates and it is likely the cumulative volume is higher than collectors’ capacity to process them. In such cases, some percentage of logs are missed. This scenario is known as log-loss. Since logs are critical, it is very important for collectors to work as efficiently as possible. Let’s consider the log collectors we are comparing:

Fluentd: This is an open-source data collector for the unified logging layer. Fluentd treats logs as JSON, a popular, machine-readable format. It is written primarily in C with a thin-Ruby wrapper that gives users flexibility. Fluentd allows unifying data collection and consumption for better use and understanding of data. It collects log data at the cluster level from different nodes and can forward it to various types of sinks.

Fluent Bit: This is another open-source collector type that collects any data like metrics and logs from different sources, enriches them with filters, and sends them to multiple destinations. Fluent Bit is written in C language and designed with performance in mind: high throughput with low CPU and Memory usage.

Vector: Built-in Rust, Vector is fast, memory efficient, and designed to handle the most demanding workloads. Vector supports logs and metrics, making it easy to collect and process all your observability data.

Experimentation and Results

We performed benchmarking performance experiments on a bare metal (BM) server. The bare metal platform consists of a cluster of 6 nodes and is solely used for this experiment. Each node has 8 CPU and 64 GB RAM. We developed a benchmarking framework that can generate various types of load based on workload profile. Based on user preference, the benchmarking framework can generate either random logs or realistic logs using a load generator. An example of each log type is shown below:

Random Log:

2021/03/30 15:51:21 goloader seq - 0000000000000000D34418F1172036F4 - 0000000044 - lkFklgcrEATMPbePsTgEOgkcDVyQIcwGuWRyZULhpUDBOpWSpBPjUGWicYGPSbnihsJsZtJokCFThVnRSvgmOQEEQqsdAtitFBvT

Realistic Log:

E0427 11:44:58.439709 1 memcache.go:206] couldn’t get resource list for metrics.k8s.io/v1beta1: an error on the server

The framework supports multiple types of workloads, called profiles, as shown in Table 1, measured in Logs Generation Per Second(LGPS). Each profile has a different number of low-stress and high-stress containers which generate logs with low LGPS or high LGPS rates. To explain how a profile works, let’s take an example of Heavy Loss profile. This profile has 8 low-stress containers with 1500 LGPS and 2 high-stress containers with 20000 LGPS which sums up to 52000 LGPS.

Table 1: Workload Profiles

When the framework starts to generate logs, it takes a few minutes to stabilize the log generation process. We run the workload generation process for 30 mins and take average values of each of the metrics like CPU and Memory.

Figure 2. LPS
Figure 3. Memory Performance

We define a metric called Logs Per Second (LPS). This metric denotes how many log lines were collected per second by a collector. Figure 2 and Figure 3 show LPS and Memory consumption for different collectors. We observed Vector performance is best and this difference is more significant in heavy workload profiles. For example, LPS is more than 2X as compared to the next best collector(Fluent Bit) for heavy workload profiles. Similarly, memory consumption for Vector is also least and almost 0.5X to 0.2X of Fluent Bit for heavy workload profiles.

Figure 4. CPU Performance
Figure 5. LPS per CPU

Figures 4 and 5 show CPU and LPS per CPU for various collectors. We observed for heavy load profiles Vector consumes almost 2x-3x more CPU as compared to other collectors. We also observed from Figure 2 that Vector LPS is also very high. This indicates that Vector is able to utilize available CPU to scale up LPS while other collectors are not able to do so. To compare CPU performance we performed normalization where we measure LPS per CPU as shown in Figure 5. We observed that the difference between the best performing collector (Fluent Bit) and Vector is not significant especially for high workload profiles.

We also wanted to run benchmarking on the AWS platform but based on initial experiments we found that there is a large variance in performance when experiments are run at different times. We used amazon’s general-purpose (called gp2) m4.2xlarge instance. We observed that in gp2 instance Throttling affects significantly especially high profile workload experiments. In AWS gp2, clusters IOPS is limited because AWS wants to ensure fair usage for all Amazon EC2 customers. We plan to run these experiments on Provisioned IOPS-based instances designed for IOPS-intensive and throughput-intensive workloads that require low latency.

Conclusion:

We observed that in most cases Vector’s performance is best in terms of LPS and LPS per CPU. It is highly scalable and also memory consumption is also significantly less as compared to other collectors. In terms of CPU consumption, Fluent Bit performance is best.

Acknowledgment: We would like to thank our colleagues Pratibha Moogi, Pranjal Gupta, Jeff Cantrill, Alan Convay for reviewing this blog and providing valuable feedback.

--

--