How (Not) to Log

Mitko N
DraftKings Engineering

--

What logging is not

Logging is often used mistakenly to achieve one of the following:

  • Metrics — measuring performance (time/quantity/rate) (often in avg/p99) of certain operations, and normally setting alarms with thresholds on those
  • Audit — persisting every done business operation for report/compliance reasons
  • Tracing — following long-running business transactions across multiple parts of the system
  • Debugging — debugging certain branches in your service

While possible to achieve any of the above with a logging system, it is far less subtle for those.

What logging is

Logging is meant to announce some major events that are occurring in your system, coupled with valuable, structured data regarding those events.

Logging (almost always) does not make sense

Given the following, fairly common and classic example

try {
DoSomethingComplex(SomeStateObject state);
} catch (Exception ex) {
_logger.Error(ex, $"Error executing ${nameof(DoSomethingComplex)}");
}

What do we see here? A complex operation fails and we log it. While on first glance it may seem a good thing, one should ask a several important questions.

The first question is: don’t we want to log parts of the state in order to be able to inspect the issue? But even better, can we make bad states for DoSomethingComplex unreachable? If the answer is yes, let's refactor our code and in such case we won't need any exception handling and logging.

The second question is: do we care if DoSomethingComplex fails? In a lot of the cases we don't because of retries, queues and similar, for example background periodic cleaning jobs, jobs that can be put pack in the queue that initializes them, or simple db calls that can be retried. One may care for the total number of failures or their ratio, and this is you guessed it right a job for metrics not logging. In other words, if we care regarding this error we should also get notifications regarding it, and not just "silently" log it. I either care or I don't.

This leads us to the third question, can we apply business monitoring/metrics? We don’t really care regarding the expectations and how they should be handled. We care about business value that we provide in our service. Sometimes your application does not raise exceptions but it is still broken, examples could be wrong validations, or any other faulted business logic that breaks the application functionality. In such cases business metrics are a lot more helpful. One could track such cases and see if there are new requests/operations, hit/miss ratios and if not — raise a rapid notification to be handled.

The last question is, do we expect DoSomethingComplex to fail? If we do, because of HTTP or Database calls for example, the error can become an integral part of the return value relaying on concepts like Option or Monad. Then you clearly indicate that something might go wrong, and act the proper requirements, fast-fail, do-nothing, handle in a centralized place are just some of the options.

Logging is a side effect

Comparing two examples:

static void Add(int a, int b) {
return a + b;
}

and:

static void Add(int a, int b, ILogger log) {
log.Information("Adding {a} to {b}", a, b);
return a + b;
}

The only difference between both functions is that the first one is a (perfect) pure and the second one has side effects, meaning it performs some operations/modifying on something out of it’s scope and is subject to return different results/fail because of this operation.

What are the challenges with the second approach:

  1. One should make sure, that the proper, or any, ILogger implementation is passed — and test it.
  2. In order to satisfy one, this usually means the Dependency Injection, and make sure we handle all what can go wrong with it.
  3. We cannot really benchmark the performance of the function, as it really depends on the ILogger instance passed to it and it’s side effects.
  4. ILogger implementation may throw an exception, so we need to be ready to handle it.

This is a clear demonstration of how achieving proper abstraction and layers in order to satisfy logging is a rather complex task by itself. Is it really needed? Do the costs ( time/space/infrastructure ) are worth it?

Logging is hard to manage

In the, extremely rare, cases in which logging does make sense, there are several challenges one should strictly pay attention to:

  1. Logs should be very strict about its format. Do you remember that we are probably going to store our logs in a NoSQL database? Our logs need to be indexable. The only way to achieve this is good design of the log messages and strict discipline of developers.
  2. Logs must not contain sensitive information. Do you want to expose database passwords or secret keys inside logs? Or even worse, personal information of your customers? Again the only way to avoid this is by good log messages design, and very strict discipline and control over the ones who write them.
  3. Once the structure and the information in logs is handled, also log levels should be taken into consideration. Developers tend to use their own ideas of what is critical and what isn’t. Again, probably each log messages should be reviewed to see if the level makes sense. Otherwise you will bloat your service and database with unneeded information.
  4. Logging hurts performance, and not by little. Logging in the critical path ( even if not on an enabled level ) greatly hurts performance, and makes it unpredictable. It creates unnecessary allocations, and additional overhead.

What to do instead?

A quick recap:

  1. Logging does not make much sense in monitoring. Use better tools instead.
  2. Logging adds significant complexity to your architecture. And it requires more testing. Use architecture patterns that will make logging an explicit part of your contracts.
  3. Logging should be done right, in a way that it is helpful and efficient when really needed, and at the same time not hurting the core business of the application . And it is hard. You will have to use a lot of tooling. And you will have to mentor developers that are unaware of the problems we have just discussed.

Is logging worth it? In my opinion in very rare cases, when a major or fatal event occurs, and logging it’s exact occurrence together with all the needed state data can help us prevent and inspect those afterwards.

Please, get this right. I understand that logging can be really useful (and sometimes even the only source of useful information). Like when an application is in it’s initial steps, and not fully functioning yet, for example. It would be hard to understand what is going on without logging. in some cases we are facing “over-logging” culture. When logs are used just for no good reason. Because developers just do it without analyzing costs and tradeoffs.

Want to learn more about DraftKings’ global Engineering team and culture? Check out our Engineer Spotlights and current openings!

--

--