Open sourcing Singer, Pinterest’s performant and reliable logging agent
Yu Yang | Software Engineer, Data Engineering
At Pinterest, we use data to guide product decisions and ultimately improve the Pinner experience. In an earlier post, we shared the design of Pinterest’s data ingestion framework. The first step of data ingestion and data-driven decision making is data collection.
At first glance, data collection seems simple: it’s just about uploading data from hosts to a central repository. However, in order to collect data from tens of thousands of hosts in various formats, uploading data reliably and efficiently with low latency at scale becomes a challenging problem. In 2014, we evaluated available open source logging agents and didn’t find any that met our needs. As a solution, we built a logging agent named “Singer”, which has been in production at Pinterest for years. Singer has been a critical component of our data infrastructure and streams over one trillion messages per day now. Today, we’re sharing Singer with the open source community. You can find its source code and design documentation on GitHub.
Singer supports the following features:
- Text-log format and thrift log format out of box: Thrift log format provides better throughput and efficiency. We have included thrift log client libraries in Python and Java in Singer repository.
- At-least-once message delivery: Singer will retry when it fails to upload a batch of messages. For each log stream, Singer uses a watermark file to track its progress. When Singer restarts, it processes messages from the watermark position.
- Support logging in Kubernetes as a side-car service: Logging in Kubernetes as a daemonset, Singer can monitor and upload loads from log directories of multiple Kubernetes pods.
- High throughput writes: Singer uses staged event-driven architecture and is capable of streaming thousands of log streams with high throughput (>100MB/s for thrift logs and >40MB/s for text logs)
- Low Latency logging: Singer supports configurable processing latency and batch sizes, it can achieve <5ms log uploading latency.
- Flexible message partitioning: Singer provides multiple partitioners and supports pluggable partitioner. We also support locality aware partitioners, which can avoid producer traffic across availability zones and reduce data transfer costs.
- Monitoring: Singer can send heartbeat messages to a centralized message queue based on configuration. This allows users to set up centralized monitoring of Singer instances at scale.
- Extensible design: Singer can be easily extended to support data uploading to custom destinations.
Figure 1. Singer internals
In detail, the services write logs to append-only log streams. Singer listens to the file system events based on configurations. Once log file changes are detected, Singer processes the log streams and sends data to writer threads pool for uploading. It then stores logstream watermark files on disk after successfully uploading a batch of records. It will also process log streams from watermark positions when restarted.
Singer can automatically detect newly added configuration and process related log streams. Running as a daemonset in Kubernetes environment, Singer can query the kubelet API to detect live pods on each node, and process log streams on each pod based on the configuration. Please see the tutorial on how to run Singer.
Open source is important not only for engineers at Pinterest, but also for companies like YouTube, Google, Tinder, Snap and others, who use our open source technologies to power app persistence, image downloading, and more. See opensource.pinterest.com and GitHub for our open source projects. Pinterest engineering has many interesting problems to solve, check out our open engineering roles and join us!
Acknowledgments: Huge thanks to Ambud Sharma, Indy Prentice, Henry Cai, Shawn Nguyen, Mao Ye and Roger Wang for making significant contributions to Singer!