Today we’re excited to announce that Postmates is open-sourcing cernan! Cernan is a telemetry and logging aggregation server that sits quietly on your system and ships whatever you send it wherever, doing in-flight manipulation in the middle.
Cernan has minimal CPU and memory requirements and is intended to service bursty telemetry without load shedding. Cernan was born out of Postmates’ need to collapse the collection of multiple telemetry collection systems while also providing us flexibility in how, when and where information about our systems is sent. Cernan has significantly cut down on the complexity of our observation infrastructure.
Cernan has three main selling points that make it pretty cool from an operations standpoint:
- Cernan can ingest from many different ‘sources’. Cernan can pull in statsd, graphite, line-oriented log files and its own native protobuf-based protocol.
- Cernan can multiplex what it pulls in over many different ‘sinks’. As of this writing, cernan can push or expose what it’s collected to InfluxDB, Prometheus, Wavefront, Amazon Firehose and more. Cernan can trivially be extended to support more sinks.
- Cernan can manipulate or create in-flight data with lua programmable ‘filters’, allowing operation teams to create telemetry from log lines, construct meta-telemetry from in-flight streams or munge collected packets.
We’ve been using cernan internally at Postmates since July 2016 and have been obsessive about its impact on individual hosts. To that end, our wiki details our benchmarking regime. Along with cernan, we’re also releasing a load-generation tool, evans, which was developed to test cernan. Say you’ve built both cernan and evans and have them sitting conveniently on-disk. In the root of the cernan project you’ll find a file called examples/configs/quickstart.toml. In one terminal start cernan like so:
> cernan -vv --config examples/configs/quickstart.toml
This will start cernan listening on UDP:8125 for statsd traffic, TCP:2003 for graphite plaintext and cernan will aggregate anything it collects, printing it to stdout every 10 seconds. The configuration documentation is available in our wiki, if you’d like to learn more. Now, run evans like so:
> evans --statsd
And you’ll see something like this from cernan:
Flushing metrics: 2017-01-13T18:05:39.151+00:00
kas: min 0.20357829770940206
kas: max 0.20357829770940206
kas: 50 0.20357829770940206
kas: 90 0.20357829770940206
kas: 99 0.20357829770940206
kas: 999 0.20357829770940206
oau: min 0.8636699958757705
oau: max 0.8636699958757705
oau: 50 0.8636699958757705
oau: 90 0.8636699958757705
oau: 99 0.8636699958757705
oau: 999 0.8636699958757705
You know, give or take. Evans as above will generate random statsd payloads and ship them to cernan at 1 hertz. You can ship graphite traffic by giving evans the
--graphite flag and ramp up the load with
--hertz. We regularly run evans with
--hertz 100000 .
What are “sums”, “sets” and “summaries” above? In order to support so many ingestion protocols cernan is very careful to synthesize source semantics into cernan’s internal model so we preserve all the information without being beholden. We’ve documented the high level of that in our wiki’s data model page and have information on individual protocol’s mapping on the source pages for each protocol. For instance, statsd. Statsd timers and histograms become cernan “summaries”, counters “sums” and gauges become different things depending on how the payload value is signed. The gritty detail is in the wiki.
Give cernan a try! We’re proud of what’s been constructed and think it’s a unique improvement over the state of the art. We’ve been running cernan like an open-source project since day one so, if you’d like to get involved, please do!