It’s time to get good at logging
I’ve seen people logging things in their code since I started programming — and I logged a lot. When pursuing my degree in CS — which required mainly implementing algorithms and finding right algorithms to solve right problems — I logged to make sure that my code works. At that time, nobody cared about tests or TDD so this seemed as the quickest possible way to tell if you are likely to come up with correct result.
Then I started my career, started writing things that hadn’t much in common with algorithmic problems I was used. At that point of my career, people in the team were using logging to be able to tell what went wrong after something already had gone wrong. Analyzing logs that came from client’s system where an issue occurred was one of most tedious tasks one could get. At that time, logs were something… private. If someone dealt with logs from our application, it was us. There was nothing wrong with some funny stuff there or cryptic messages only an author could understand. Support’s role was often limited to downloading logs and it was our job to analyze them.
Then a cloud came. With rise of DevOps movement, microservices architecture and central logging, browsers like Kibana and all this modern stuff, logs stopped being something private. They started to be something that may easily be seen and searched by other developers. They also advanced to be an actionable information — you may have an alert waking up an on-call person if there is too many errors in logs. So now logs are not only a matter of “how much time we’ll spend investigating a problem when one occurrs” but may be a matter of “is that worth waking a person up?” or “should somebody take a look?”.
In general, logs are becoming something more than side effect and potential source of information after something bad happened. They can be a reasonable base for an actionable alert for Ops team, trigger for an automated process or even a way to evaluate the success of your test in distributed environment (happened to us with lambda functions, but that’s another story).
While we all aim for self-healing monitored systems in which logs are no longer needed, logs may be your best friend in transitive period, during which teams mature and learn new paradigms, architecture patterns, technologies and so on.
On developer side, most of us is used to logs being a clutter and noise, which spreads all over the code. There is nothing preventing us from applying DI and IoC principles to logging code, but, in practice, there has been little incentive to do so, apart from aesthetic reasons and compulsive need to have Clean Code. To be honest, when primary way of interacting with logs was downloading log file from some machine and more-or-less manual analysis of it (or homegrown solution to do so), there was little chance that we’ll be required to change implementation of logging and therefore little motivation for nice design in this aspect.
With cloud and central logging, the situation is different. “Logger” may no longer be just an utility writing to a file or standard output, but a service from cloud provider (AWS XRay, for example). That service may be available in one region, but not in the other, and what if we want our code to be “cloud agnostic”? Also, even when logging to standard output or file, there may be a preference from ops team, by which developers need to abide, or a format that is easily searchable or one with known parser configuration etc.
That’s why we should care WHAT we log (what log level? is it actionable information?) and HOW we log (implementation) and no longer have excuses for being careless with logging code.
So here are some rules and approaches I learned so far:
WHAT (content)
This will be the shorter section, as the answer is: it depends — especially when it comes to the content of the message — you need to figure out by yourself what information is needed.
For choosing log level, as a rule, I like the approach that I call “actionable”: only log things that require somebody’s action. I borrowed (some of) this approach from excellent book by Michael T. Nygard: Release It! Design And Deploy Production Ready Software (by the way, second edition coming soon!), which I recommend highly.
My take on it, based on the book and blended with my experience (I am not an Ops person, so feel free to skip to next section and just read the book instead):
- ERROR level should be reserved for things that require action right now, so they are worth of bothering person on-call, even when this means waking them up. One thing worth noting is, however, that in the distributed environment one instance of an service or application may not be capable of determining if someone should be reacting right now (one instance may be in trouble, but other two providing service and handling current load). This advice may then evaluate to: should be taken into account when issuing alerts. The alert need not be “wake her up whenever there is an error” but rather “wake her up when you see more than x errors per minute”.
- WARNING means that somebody should take an action, but it may wait until they come to work at their regular hours.
- INFO should be used in cases when logging is required, as a part of auditing requirements or security standards within organization (and there is no dedicated service that we should call with such information).
- DEBUG and TRACE should not be used [edited:] in production. The reasoning behind is pretty simple: it’s unlikely that applications in production run in DEBUG or TRACE log levels. If you need DEUBG or TRACE logs, you need to reproduce the issue in other environment or with higher log level — and in general, if you have means of reproducing the issue, you are likely to have better means of debugging at your disposal than adding logs. I consider DEBUG and TRACE logs as a means for debugging that may be used only in development phase and as a last resort. From my experience, the need for them signals a deeper problem: tests being run in environment that is very different from production environment and/or development done in such environment and/or with data that does not match “real” use-cases. TRACE and DEBUG in such cases are used as a means of finding out something about environment or data and often left in code (we have application running with INFO level anyway, so we don’t care).
Also, logs are the most expensive way of signaling a problem, in terms of how much time, space and cognitive effort it takes to parse and understand them. Sometimes they cannot be avoided (for example — logging exceptions, especially those exposing flaws in application logic), but monitoring with wisely configured alerts and threshold should take precedence for problems that are anticipated (inability to connect to external service, for example).
There are organizations that do not use any logging at all with reasoning that maintaining logs and tooling around iso just too expensive and potential advantages are not worth it.
HOW (form)
While section above becomes outdated as you read it — as we aim for solutions that make logs no longer needed and used as rarely as possible — this section is more likely to stay relevant, as it can be applied not only to logging in it’s traditional forms but also to other code that supports non-functional requirements — calls to auditing service, incrementing metrics, benchmarking etc. Considerations below are based on my experience with Java.
“Standard approach” to logging usually involves using a popular library, such as slf4j which exposes methods to log trace, debug, info, error messages. It provides a few overloaded versions, appropriate for logging texts, objects and exceptions. Logger is usually acquired inside every class, often kept as a static field:
class Example {
private static Logger logger = LoggerFactory.getLogger(Example.class);
private final String name;
…
public void show() {
try {
…
logger.info(“Example is shown”);
} catch (Exception e){
logger.warning(“Unable to show example with name ” + name , e);
}
}
}One of drawbacks of such approach is that String concatenation happens irrespectively of the log level, but this can be handled with using appropriate form of logging method. (In case of slf4j see https://www.slf4j.org/faq.html#logging_performance). While this exact problem may be specific to Java and slf4, it comes down to the fact, that you want the logging library to be used in a consistent way across the whole project, which may be limited to only a few of many methods the library provides. One approach is to track those things when reviewing pull requests, but another one is create your own logger class with minimal number of methods that meet your specific requirements.
class CustomLogger { public void() info (Class callingClass, String message, Object param) {
Logger logger = LoggerFactory.getLogger(callingClass); }...
}
Additional advantage of such approach is that dependency to external logging library in now enclosed within this one class, which makes it much more easier to replace, shall such replacement be needed.
Next problem we are likely to encounter is the inability to inject logger as an dependency, which is likely to be required by tests. In simplest form that may lead to extracting logger interface and adding logger as a parameter to constructor of every logging class. This will allow for mocking loggers in tests:
class Example { private final ICustomLogger logger; public Example(ICustomLogger logger) {
this.logger = logger;
} ...
}
While improving testability that still doesn’t address problems with cluttered code and logging code all over business logic. At that stage, people tend to come up with solutions that would group logging statements somehow, usually by having them wrapped with private methods.
class Example {
private ICustomLogger logger;
public void show() {
…
exampleShown();
}
… private void exampleShown() {
logger.info(“Example no {} shown to {}”, number, observer);
logger.info(“Counting displays: {}”, counter);
}
}
I would call this a slight improvement, as grouping logs into named methods helps to clarify intent (why is this log here? what event did we want to log?) but doesn’t really help with readability. In some cases two or three lines may be replaced with one, but on the other hand when reading code we may wonder which methods are logging only and which are logic. That may be improved by prefixing those methods with “log”; that is instead “exampleShown()” above we would have “logExampleShown()”.
Still, we have additional parameter to all constructors and we need to mock loggers for tests.
Next iteration may be a hierarchy of classes:
class LoggingExample extends Example { ... }Extending class has one and only job of adding logs before and after calls to methods from parent class:
class LoggingExample extends Example {
private ICustomLogger logger = new CustomLogger();
private void logExampleWillBeShown(){
logger.info…
}
…
public void show() {
logExampleWillBeShown();
super.show();
logExampleShown();
}
}What we get is a clear division: logging only in extending class and parent class containing business logic only (at least at first sight). What many people see as a restriction is the fact that we can add logs only before or after calling a method from parent class, but that seems to be possible to mitigate by inheritance: we can use tricks such as protected methods and if we need to add logging in the middle of the public method, we’ll just split it into two protected methods, right? That’s a trap, for a few reasons: firstly, your logging class starts to have logic (it calls two or more methods from parent class in a given order). This is dangerous, because your tests test the parent class and its public API and there is no test that guarantees that the result of calls in logging extension (which will used in production) is the same as calling public method of base class. What more, in my examples, for sake of brevity, I passed no arguments to logging methods, but in reality we would like logging methods to know some data. With such construct as presented before, we are likely to reach for protected or package-private fields of parent class. And in this way, while we separated logging from business logic “visually”, we coupled their implementations by strongest possible relation (inheritance). It is now not possible to change implementation of Example without changing the implementation of LoggingExample.
A way out of this trap may be using an interface:
interface IExample {
public void show();
...
}class Example implements IExample {
...
}class LoggingExample implements IExample {
private IExample example;
private ICustomLogger logger;
public LoggingExample(ICustomLogger logger, IExample example) {
...
}
public void show() {
logger.info(“Example will be shown”);
example.show()
logger.debug(“Example was shown”);
}
}
This does limit logging to what’s exposed by IExample class, but that’s a price we need to pay if we want keep implementations loosely coupled. There may be further variations (for example, using another extending interface for object injected in constructor), but they will all impose the restriction of being able to log only results of public calls of “business” class (at least in Java). While this is a restriction, I view it as one that helps maintain clean code. Another advantage is that logging class can be easily tested — imagine it’s not only logging, but auditing, and you need to make sure that certain data is being recorded.
Last two approaches may be viewed as verbose (up to twice as many classes in the project), but in many cases you may find out that you don’t really need logging version of the class and the clarity of the solution should be worth it, especially in complex cases.
Let me know if you know better ways of dealing with logging code in your Java applications!
