Cloud logging & Tailf.io

Over the past decade, I’ve experienced many forms of log writing, collection, aggregation, and searching. Some companies or teams implement their own solutions, some more recently rely on SaaS providers, others move away from log streams to structured log events (the cloud version of Windows ETW).

Distributed Log Search

I spend a few years in BingAds where we relied on a combination of distributed log searching (using internal tools) and Log Aggregation into Cosmos.

The internal search tools were generally effective, allowing filtering based on time spans, machine roles and trace tokens. This technique did suffered from the following:

  1. Put load on production machines (searched files stored on prod machines)
  2. Good for after the fact searching, not used for alerting
  3. Didn’t work if the production machine was offline
  4. Geared towards long running processes

Log Aggregation

Log aggregation is where your logs are either directly sent or eventually sent to the a centralized system. This is the role that Cosmos played within some teams it is also the role that some SaaS Companies now play. I currently use Papertrail for Breadboard.io.

Other SaaS Log Aggregators include: Papertrail, Loggly, GrayLog, Sumo Logic, and Splunk.

Overall Log aggregators solve a few of the problems associated with Distributed Log Collection and open the door to alerting and log analysis. They still assume that we are monitoring long running processes.

Structured Logs

Structured logs provide a clean mechanism for tracking events within the system without having to deal with raw log serialization/deserialization.

Google Analytics is one of the more successful options available, mainly used for tracking site visitors and user events. Similarly you can use Dynatrace, Application Insights Analytics.

Structured Log providers provide the greatest level of analytics, alerting, and anomaly detection. They no longer specifically assume a long running process, instead they look at the whole system as a series of events, where a machine is simply an attribute of data.

Tailf.io

This leaves us with a gap… What if we want to monitor the raw logs of many small individual tasks? How about in real-time?

This clearly rules out distributed log search and log aggregation.

It could possibly be feasible with structured log providers, although they lose the details of the raw output.

This is where Tailf.io comes in. In the age of serverless computing, Tailf.io allows us to track, share, and monitor in real-time the individual tasks that are being executed.

It also allows us to monitor the various longer running jobs without being tied down to specific task runner implementations (I’m looking at you Jenkins)

In the next few days I’ll continue to introduce Tailf.io and the role that it plays in Breadboard’s greater serverless echo system.