Optimizing 700 CPUs Away With Rust
In Tenable.io, we are heavy users of Datadog custom metrics. Millions of metrics are sent through Dogstatsd, providing deep insights into the complex platform. As the platform grew, we found that a significant number of metrics sent by legacy apps were obsolete. We tried to hunt down these obsoleted metrics in the codebase, but modifying legacy applications was extremely time consuming and risky.
To address this, we deployed a StatsD filter as a Datadog agent sidecar to filter out unnecessary metrics. The filter is a simple UDP datagram forwarder written in Node.js (sample, not actual code). We chose Node.js because in our environment, its network performance outstripped other languages that equalled its speed to production. We were able to implement, test and deploy this change across all of the T.io platform within a week.
While this worked for many months, performance issues began to crop up. As the platform continued to scale up, we were sending more and more metrics through the filter. During the first quarter of 2021, we added over 1.4 million new metrics as an effort to improve our observability. The filters needed more CPU resources to keep up with the new metrics. At this scale, even a minor inefficiency can lead to large wastage. Over time, we were consuming over 1000 CPUs and 400GB of memory on these filters. The overhead had become unacceptable.
We analyzed the performance metrics and decided to rewrite the filter in a more efficient language. We chose Rust for its high performance and safety characteristics. (See our other post on Rust evaluations) The source code of the new Rust-based filter is available here.
The Rust-based filter is much more efficient than the original implementation. With the ability to fully manage the heap allocations, Rust’s memory allocation for handling each datagram is kept to a minimum. This means that the Rust-based filter only needs a few MB of memory to operate. As a result, we saw a 75% reduction in CPU usage and a 95% reduction in memory usage in production.
In addition to reclaiming compute resources, the latency per packet has also dropped by over 50%. While latency isn’t a key performance indicator for this application, it is rewarding to see that we are running twice as fast for a fraction of the resources.
With this small change, we were able to optimize away over 700 CPU and 300GB of memory. This was all implemented, tested and deployed in a single sprint (two weeks). Once the new filter was deployed, we were able to confirm the resource reduction in Datadog metrics.
- Replaced JS-based StatsD filter with Rust and received huge performance improvement.
- At scale, even small optimization can result in a huge impact.