Smart Application Logging For Production Grade Systems

Aurora Solutions
Aurora Solutions
Published in
5 min readApr 10, 2019
Photo by Caterina Beleffi on Unsplash

Production Grade is a generic term used to refer to software or hardware that is designed for regular and intensive use with real users. Most of the time, such systems need to be fault tolerant and zero down time is preferred at the customer’s site. If at all there is a downtime for the system, it would be desirable to know more about the reason for downtime and learn from it to prevent recurrence of the problem again. This can be achieved using Logging.

Logging And Its Importance

Logging is the process with which events that occur while running a software are made note of and saved to a file for later reference. If logging is enabled, it helps technical support team to debug the software and understand the exact cause of problems. It is a common practice to categorize logs and separate them in order to make troubleshooting easier. These categories are called Log Levels.

Log Levels

Major logger implementations support the following standard log levels so that user would know where to look for what information.

OFF

This is the log level which suggests the application that logging mechanism shall be switched off and nothing shall be logged.

INFO

These messages are logged just for information. Most often, these do not hint an underlying problem but indicates normal behavior of the application.

WARN

Sometimes, the application might enter into a state which can lead to possible application errors. At these times, a log message with WARN level would be helpful. While troubleshooting the application, a WARN message can help us understand the start of all problems that led to application failure.

ERROR

If there is a serious issue with the application, a log level with ERROR can be used to denote that. It represents failure of a major segment or portion of the application.

FATAL

If the application is about to go into a complete shutdown or a state from which recovery is not possible, that event can be logged using FATAL log level. Issues which cause FATAL log levels are usually showstopper bugs at customer site.

DEBUG

This includes information that is iagnostically helpful to people more than just developers. It can be used to aid the customer support, system administrators and developers better, it is useful to add granular information using DEBUG log level. Whenever a ticket is created for the application, the customer support representative usually changes the log level from INFO to DEBUG for detailed diagnosis.

TRACE

Used when“tracing” the code and trying to find one part of a function specifically. If the DEBUG log does not help further, more diagnostic information would be required by the customer support representative. In such cases, the log level may be further moved to TRACE. TRACE gives too much detail. Hence, once this log level is used, the application’s log messages alone can use a lot of storage space.

ALL

Setting log level to ALL denotes that all the log messages, regardless of which level they are categorized into, shall be captured. If the developer of the application has defined a custom log level, messages pertaining to that level shall also be logged.

What Should And Shouldn’t Be Logged

While creating log messages, the data logged should be sufficient enough to allow a smooth troubleshooting process for the customer support people. Although it is desirable to log enough fine-grained information as DEBUG or TRACE logs, care should be taken not to overdo logging. In addition to that, it would be better not to give out sensitive data as a part of the logging process. Confidential data that shall not be logged includes any critical information about the company, keys used for encryption/decryption if any, credentials of users etc.

Serialization And Parsing

In order to ensure that the log files created shall be easy to decipher for anyone looking at them, standard logger applications such as log4j allows developers to specify the format in which the log file shall be written. Once we know the pattern of the log file, it would be as easy as executing a ‘grep’ command (or any text search) to find out specific place where we would like to focus more in order to troubleshoot. Log4j logger implementation allows custom pattern definition using PatternLayout.

Serializing Dates And Timestamps

Using %d{dateformat}, timestamp can be logged along-with the actual log message. The timestamp can contain local time. A better approach would be to allow logging in UTC timestamp.

Serializing IP

To log the IP address of the machine in which the application is running, we can specify ${hostname} in the log4j properties file. This can be handy in case of logs created in clustered environments. By looking at the log files, one would be able to determine which node faced application error.

Use Fully Qualified Name Of Class

This is of utmost importance in case of a fairly large application. Looking at the fully qualified class name, one would be able to understand which application module resulted in the error. To enable this, we need to add %l to the properties file.

Logging Exceptions

It is advisable to use an appropriate exception handling mechanism so that a person looking at the log file can understand where exception occurred and get a hint of what could be the underlying cause. Let us take a look at the try-catch block in Java to understand it better.

A common bad practice that can be seen among novice Java developers would be to handle exception in the following way:

The above piece of code has two main problems:

- Using only a generic Exception to catch all types of exceptions that could be thrown by the method() will not help to understand the underlying cause.

- Printing the stacktrace will not be of much use while handling exceptions. It would make the console look cluttered and ugly if an exception is thrown.

A better way to handle exceptions would be to log a custom message that can make sense to the user. These messages shall be logged using ERROR log level.

Conclusion

In this article, we learnt about logging in applications for production grade systems. We saw the various log levels and how to use them. It is important to know what information should be logged and what one should refrain from logging. Having a standard pattern for logs ensures better readability and one would be able to skim through them easily while debugging the application.

--

--