Over the past decade, I’ve experienced many forms of log writing, collection, aggregation, and searching. Some companies or teams implement their own solutions, some more recently rely on SaaS providers, others move away from log streams to structured log events (the cloud version of Windows ETW).
Distributed Log Search
The internal search tools were generally effective, allowing filtering based on time spans, machine roles and trace tokens. This technique did suffered from the following:
- Put load on production machines (searched files stored on prod machines)
- Good for after the fact searching, not used for alerting
- Didn’t work if the production machine was offline
- Geared towards long running processes
Log aggregation is where your logs are either directly sent or eventually sent to the a centralized system. This is the role that Cosmos played within some teams it is also the role that some SaaS Companies now play. I currently use Papertrail for Breadboard.io.
Overall Log aggregators solve a few of the problems associated with Distributed Log Collection and open the door to alerting and log analysis. They still assume that we are monitoring long running processes.
Structured logs provide a clean mechanism for tracking events within the system without having to deal with raw log serialization/deserialization.
Structured Log providers provide the greatest level of analytics, alerting, and anomaly detection. They no longer specifically assume a long running process, instead they look at the whole system as a series of events, where a machine is simply an attribute of data.
This leaves us with a gap… What if we want to monitor the raw logs of many small individual tasks? How about in real-time?
This clearly rules out distributed log search and log aggregation.
It could possibly be feasible with structured log providers, although they lose the details of the raw output.
It also allows us to monitor the various longer running jobs without being tied down to specific task runner implementations (I’m looking at you Jenkins)