NanoLog: A Nanosecond Scale Logging System [ATC 2018]
Faster Logs == Faster Program == Happy Users
Our goal is to provide an easy, general and high-level context of this paper’s contributions and our take on the implications of the paper. Please refer to the actual paper for more details if this post interests you and please feel free to contact us for errors, changes and suggestions!
NanoLog [paper] is one of those works that can be applied immediately to your programs to boost their performance. The cool thing is that it doesn’t require any new specialized hardware to apply this idea, so the performance gains kind of comes for free. This paper proposes a way to optimize how your logs are written to the disk. Since, logs are an essential part of most programs, this paper aims to optimize all possible programs that logs anything. Let’s first cover some background knowledge required for this paper and then talk about the contributions of this paper.
Background
Logging System refers to a system that is responsible for keeping some record of events that occur within a set of programs. FIG 1. shows an example output from a logging system. It is a tool that a programmer learns to use from day 1 of programming. For example, print "Hello World!"
is a form of logging that allows the programmer to print out a log of the program’s execution. It is an important and an essential tool for learning about the program itself, as well as analyzing how users interact with you program. Some example of logging systems are log4j, glog, Boost log, and event tracing.
Logging systems generally lets the programmers to specify the level of the logging message. For example, within a log4j, a famous open-source logging system, a user can specify the logging level as follows (obtained from https://www.tutorialspoint.com/log4j/log4j_logging_levels.htm).
import org.apache.log4j.*;
public class LogClass {
private static Logger log = Logger.getLogger(LogClass.class);
public static void main(String[] args) {
log.setLevel(Level.WARN);
log.trace("Trace Message!");
log.debug("Debug Message!");
log.info("Info Message!");
log.warn("Warn Message!");
log.error("Error Message!");
log.fatal("Fatal Message!");
}
}
Given the logging level, the program is compiled to display only the logs that are above the certain level. From the example, this program will only display warn
, error
and fatal
messages. The main reason for having such a log level is to optimize the program’s performance. Each logging message is expensive. NanoLog tells us that formatting each log message takes around 1ms. So, more logs you have, slower your program. Thus, we, as a programmers, learn to be concise with our logs and display on the messages that are absolute necessary. In fact, many large companies spend hours deciding which log messages to keep or throw away!
Logs within a logging systems are usually written to disks. These disks can be in the same computer that the program is currently running, or can sit in a remote set of servers dedicated to store logs. In any case, disk I/O is slow and expensive. Each logs can by around 50~100 bytes long, thus an SSD with 250 Mbytes/sec bandwidth can only write a few million messages per second! So, we can see that how logging can easily be bottlenecked by the system’s hardware limitations.
Which Problems is this Paper Trying to Solve?
NanoLog’s goal is to make logging faster and scalable. As mentioned above, each log message is limited by the time it takes to format the message, as well as the I/O throughput. Often times, many systems spawn new process that is responsible for taking care of all of the logging operations. Although this method allows the main program to continue to run without waiting for each log message to be completely processed, this method’s performance is still limited by the disk I/O throughput. Thus, if the main program generates more log messages than the logging process can consume, this method can still result in blocking or loss of messages. Therefore, this paper aims to solve this issue by optimizing log generation, storage and query.
How is this Paper Solving these Problems?
The authors made two critical observations about logging. First is that the log messages generated necessarily not need to be human readable, since humans only read a portion of the logs. The logs can be stored in any format, and when the humans need it, there can be a program to make it into a human readable format. Secondly, log messages often contain repeated static messages with partially dynamic contents. So, it is possible to map the static portion into a smaller data format and reincorporate the static portion when the logs need to be read. These observations and proposed solution reduces time to process log messages and reduces the size of the log messages to be generated.
NanoLog has multiple parts. First, it contains a preprocessor, which replaces original logging statements with optimized code during compile time to avoid formatting of the log messages. This also results in a smaller log file. Secondly, it contains a runtime library, which buffers log messages from the main program and outputs the messages in a compacted format. Finally, it contains a decompressor, which transforms the compacted log messages into human readable messages.
What are the Results?
NanoLog is 1~2 orders of magnitude faster at logging than existing logging system. It achieves a throughput up to 80 million log messages per second, compared to 0.5~5 million log messages per second that can be achieved by many traditional logging system. Finally, each logging statement takes 8~18 nanoseconds to complete, compared to 1 millisecond it takes to format a log message in a traditional logging system. More detailed results are available in the paper.
My Take on this Paper
Every programmer is well aware that careful logging is important. Logging is a double-edged sword, as it can give you detailed information about how the program runs, but it can also significantly slow down your program if not used correctly. That’s why we learn to use various levels of logging at different places of the code and still go through hours of code reviews to add and/or remove critical logging messages. Although this paper does not solve this issue completely, this paper reduces the burden of logging on the program that we write. I think this is a large step forward in a more optimized logging.
I am very grateful to be able to review this paper. This paper comes from fellow Stanford Ph.D. students sitting next to my office on the 3rd floor of the Gates building. So, I can closely feel the hours of work and dedication that went into this paper.
We are looking for passionate writers from all fields of CS!
I believe that one of the main and most important habit that a grad student must have is to read papers at a regular fashion. I thought that being a paper reviewer with a light amount of responsibility gives incentives to read more papers. So, I personally started this blog for me to record what I was reading and I found it really really helpful. I hope that more can join my experience and have a great learning experience in the process! Feel free to email me at yo2seol@cs.stanford.edu if interest!