A needed first step to logging analytics
We all know that logging analytics is a big part of the current toolbox for modern software engineering. Logging analytics is not just adding tools to parse your log. If you design your logging to be used for a logging analytics tool it will not be as useful as you think.
The following are some tips to write a better application log and enable the use of Logging analytics tools such as ELK Stack or Graylog.
Do not write log by yourself
Never, ever use printf or write your log entries by yourself to files, or handle log rotation by yourself. Please do your ops guys a favor and use a standard library or system API call for this. This way, you’re sure the running application will play nicely with the other system components, will log to the right place or network services without special system configuration.
Log at the proper level
When you followed the 1st tip, then you can use a different log level per log statement in your application. One of the most difficult tasks is to find at what level this log entry should be logged.
I suggest the following pattern:
- TRACE level: this is a code smell if used in production. This should be used during development to track bugs, but never committed to your VCS.
- DEBUG level: log at this level about anything that happens in the program. This is mostly used during debugging, and I’d advocate trimming down the number of debugging statement before entering the production stage, so that only the most meaningful entries are left, and can be activated during troubleshooting.
- INFO level: log at this level all actions that are user-driven, or system specific (ie regularly scheduled operations…)
- WARN level: log at this level all event that could potentially become an error. For instance, if one database call took more than a predefined time, or if an in-memory cache is near capacity. This will allow proper automated alerting, and during troubleshooting will allow to better understand how the system was behaving before the failure.
- ERROR level: log every error conditions at this level. That can be API calls that return errors or internal error conditions.
- FATAL level: too bad it's doomsday. Use this very scarcely, this shouldn’t happen a lot in a real program. Usually logging at this level signifies the end of the program. For instance, if a network daemon can’t bind a network socket, log at this level and exit is the only sensible thing to do.
Write meaningful logs
There’s nothing worst than cryptic log entries assuming you have a deep understanding of the program internals.
When writing your log entries messages, always anticipate that there are emergency situations where the only thing you have is the log file, from which you have to understand what happened.
Also, do not log a message that depends on previous messages content. The reason is that those previous messages might not appear if they are logged in a different category or level, or worst can appear in a different place (or way before) in a multi-threaded or asynchronous context.
Do not use logs like these
“Transaction failed”
or
“User operation succeeds”
or in case of exceptions:
“java.lang.IndexOutOfBoundsException”
Without proper context, those messages are only noise, they don’t add value and consume space that could have been useful during troubleshooting. Prefer alternatives like:
“Transaction 2346432 failed: cc number checksum incorrect”
or
“User 54543 successfully registered e-mail user@domain.com”
or
“IndexOutOfBoundsException: index 12 is greater than collection size 10”
Prefer to write in English
English means your messages will be in logged with ASCII characters. This is particularly important because you can’t really know what will happen to the log message, nor what software layer or media it will cross before being archived somewhere. If your message uses a special charset or even UTF-8, it might not render correctly at the end, but worst it could be corrupted in transit and become unreadable. In case you need to log in any other language, please use only ASCII characters.
Do not log personally identifiable information
One common issue that happened with logging is the use of personally identifiable information (PII) or any other secret information like credit card number, SSN, etc. This is a big security issue. As an example, take the following logging line:
“User 54543 successfully registered e-mail user@domain.com”
Prefer to log the md5 hash code or any other hash:
“User 54543 successfully registered e-mail cd2bfcffe5fee4a1149d101994d0987f”
Use UTC for data logging
If you have ever analyzed logs in a real world you know that we need to get information from several sources (servers and services) and matching the event order is a big issue. To help in that matter please use only UTC when logging dates, if you follow the first tip, that normally is easy to do, as most frameworks offer a way to configure the output.
Create or use a transaction ID
A simple trick that will make your life very easy. In real production environment be able to search your logs fast is life changing. One major problem is to find all the logging pieces to understand the full context. Having a simple identifier for the whole transaction is a simple way to help you find, being that by using grep or Elasticsearch queries.
If you follow the first tip that is not hard to do, there are several frameworks that can generate that and save the info in a thread-local safe way for you to use on your logging framework.
if you do not have that option, a simple HTTP filter will do the trick, and the time invested in that will pay for itself very quickly.
These simple tips will help you with or without a fancy logging analytics tool.