Logging levels explained

Michael Merkulov
7 min readOct 27, 2017

--

Good old logs remain a very important tool (often overlooked) for developers in troubleshooting applications. Sometimes it’s the only way to figure out production issue, yet developers sometimes are confused about what/when to log and at what level.

What should you log at Error/Warn level? What is the difference between Debug and Trace? Should you ever log something at Info level? While it’s all subjective, I compiled a list of tips and suggestions based on my experience on a number of projects. These suggestions are tool-agnostic, so it does not really matter if you use log4j, log4net, NLog or something else. In this particular post I’ll use NLog as an example.

Error

As its name suggests, you should use it when something goes wrong (or really wrong) in your application. If user reports a problem, ideally you should have at least some information in the log file with Error level describing that problem to some extent.
It’s the first immediate response to the problem when troubleshooting production app — what do we have in Error log? In many cases that should be enough to pinpoint the problem.
OK, but what exactly should go there? Most of the times it’s the exception you get.

Note how we added some context, in this case document number. That will help you match the exception to the specific problem you’re investigating. You might get several exceptions for several use cases, so it is always convenient to be able to quickly filter out not related messages.

Another good use-case is generic exception handlers. For example, for desktop projects:

It’s the last resort — if the exception is not handled in the code, generic exception handler should at least try to log it. You will have an error properly logged so you could investigate it later.

Note: not every exception should be logged as Error. That’s a common anti-pattern. If you use validation exceptions for your business logic, these should almost always logged as debug instead. For example:

If you do ErrorException() in this case, there would be way too much noise in your error files under load, and you’ll most likely miss really important error messages.

As an alternative, you might just log base exception for all business-related validation messages, like that:

That would be handy if logging is the only thing you do in case of validation issue.

Warn

Warning, by definition, is not an error in a sense that application can function properly, but it’s still something you might want to pay attention to in production. Usually in production you should have logging level set to Info, meaning Warn should be ‘ON’ by default. If somebody looks at the log in production and finds Warning message, they might want to take some action. Usually there are not many warning messages in the code base. One scenario in which Warn might be useful is configuration files:

Another use-case is security: if somebody tries to pass suspicious input in production, that deserves a Warning as well.

Other scenario is a signal for developers that they did something wrong, albeit working, like making blocking remote call from UI thread for desktop application:

The last scenario would potentially result in lots of logging messages in production (depending on how often the call would be made), but in this particular scenario it is justified, making developers to fix this instead of just ignoring it.

Info

We primarily use it for IT-level information, like starting/stopping services, # of messages in the queue etc. For example:

The idea is that if somebody looks at Info level logs, he/she will know when the service was started, was it successful, and maybe other context-specific information for that particular service.

The difference with Warn is that Info does not necessarily imply any action on DevOps side — it’s more used as an acknowledgement of some IT action.

Note that you should not use Info for debug-level information, like

The reason for this is that Info is usually reasonable default setting for production logs, meaning all Info, Warn and Error-level logs will always be written. It should not contain per-transaction information, this kind of information should go to Debug logs.

Debug

OK, finally we got to Debug logs. Obviously, we should put here information that will be helpful during troubleshooting issues, but… what information would be helpful? The irony is that usually we know specifics only after the fact while doing postmortem analysis. You might have experienced the feeling of “if only I knew what was the value of this variable…”

If we do not know what we do not know, can we come up with a reasonable set of rules for debug statements? In other words, what are the guidelines of good debug logs?

Let’s start with an example:

Debug Logs should be human readable.

You should be able to quickly skim through the logs to get a general idea what is going on. You should not log all variables, it will make your code less readable and it will only complicate your troubleshooting activities.

Sometimes you might have a difficult-to-reproduce scenario, or some intermittent behavior. If you’re stuck while troubleshooting specific problem, you might want to temporarily add very detailed logging to reproduce that scenario, but make sure to remove it afterwards.

Debug Logs should tell the story of your workflow.

In the example above, you could figure out what was happening just by looking at the log files. Good rule of thumb is that you can insert logging statement when you have a significant if-statement, that is, you’re branching your workflow, like this:

Logging statements like this do not add lots of complexity, do not really impact readability, but might help you in production.

Other scenario is wrapping your business operation in “start” and “end” statements:

First statement can provide you with details about specific entities that participate in business transaction, so you can use it as a reference for more detailed analysis. If you do not see the second statement in your log, that means that for that particular transaction something wrong happened, so again, you have details about which stocks were not allocated.

Do not put debug statements in getters

You should not do something like that:

If UserName getter is called multiple times, you will have a a lot of meaningless logging statements which do not add any value but only clutter you logs. Another subtle and less obvious problem with this code is that the context is lost — you can’t tell from where this getter would be called and whether we want to know that we use this getter or not. The whole idea of logs is to provide you with meaningful context to help debugging. A better way to handle this is to add logging statements where you know what the context is depending on your architecture — it might be Command, or Rich Domain method like

Contract.Approve();

or something like that. Usually it’s a place when you change the system state.

So now we have an idea of what should go into Debug logs, but what if it’s still not enough information to successfully troubleshoot the issue? What if we want very detailed logs? Let’s look at the Trace level.

Trace

Trace level provides you with very detailed information about the workflow. It might consist of:

  1. Full request input for the service — name of service operation/REST resource etc. with all the parameters, headers etc.
  2. Full request output for the service.
  3. SQL statements for the service operation if applicable.
  4. Detailed information about calls to 3rd party services.

These logs are auto-generated (developers don’t write them manually in code), often implemented as a hook into your processing pipeline. Usually Trace logs are turned on for a short period of time to troubleshoot specific issue (might be as well filtered by user name, order # etc.), and then turned off as soon as possible.

We had a case when Trace logs were accidentally turned on for automatic load testing, and got couple gigabytes of logs within several minutes. Even for one user, if left on it could eat up gigabytes of space over time, so make sure you do not configure your logs to have Trace level in production by default.

Also, these logs could be useful while development — for example, if your logging system allows you to split logs into different files per service operation/request, you can quickly see input/output and SQL statements without even debugging your code. It gives you faster feedback than debugging or unit/integration testing.

It could look like that:

The best thing about Trace logs is that you can get a lot of valuable diagnostic information for very little effort — you write the logging infrastructure once (could take you less than a day), but you get this information for every service operation / scenario.

Final notes

Here is some useful numbers on the percentage of statements per logging level. Numbers are made up and will differ from project to project, but it gives you a general perspective for the case when all logging levels are turned on:

  1. Error — less than 1%
  2. Warn — less than 1%
  3. Info — less than 1%
  4. Debug — about 10%
  5. Trace — 90%

It is very important to make sure you turn on Info level (or above) in production under normal circumstances.

--

--