Bunyan Logging With Production Troubleshooting in Mind

Published in

The Startup

7 min readAug 13, 2020

I’ve been providing technical support 24/7 for most of the Node.js services I’ve worked on for quite some years now. During this time I’ve found myself many times wearing a detective hat, digging into thousands of log lines, when no other resource was available to help to troubleshoot an issue. Of course, in an ideal world, you have illuminating metrics, thorough crash reporting tools, and other similar allies to help you find the root cause of the troubles. In reality, log files are sometimes our last resort to find the culprit. I’d like to share some of the principles I’m sticking to in order to produce logs that will help me catch the error smoothly and as soon as possible.

Know the signature of the functions

It’s good to double-check the log method API of Bunyan. I’ve seen many occurrences of logs where the message is passed in the wrong argument of the function, which leads to a less readable output.

If we run the code above, you can see how the first log line outputs more sensible and readable data than the second:

The first log line has a msg property with value Something went wrong, as opposed to the second log line, whose msg contains an amalgam that mixes the message with the error stack trace. Should you have a log visualization tool like Kibana or Splunk, the first log line will be easier to search for if you perform a pseudo-query similar to msg="Something went wrong".

Moreover, this approach will allow you to benefit from bunyan’s CLI filtering capabilities. More about that later on.

First the fields, then the log message

An object literal can be passed to the bunyan log methods, but make sure this object is the first argument, and the second one is the log message. The output will not be so nice if you do it the other way around.

As you can see in the screenshot below, the first log line looks much better than the second.

Just like in the previous example, the first log line has a proper error message, and the object literal will be merged with the resulting JSON generated by bunyan, which will allow you to, again, perform fancy queries with your log visualization tool:

msg = "The player profile has been successfully stored in the database" AND player.name = "Dunkelheit" AND player.level > 10

Again, this is just a pseudo-query, and mileage might vary depending on which log visualization tool you are using.

The second log line will once more contain an amalgam which will make these feats impossible to achieve.

Use the standard bunyan serializers

Bunyan comes with a very convenient set of default serializers that will render readable JSON output for elements like errors, request and response objects, assuming that you use the log method API the right way, sticking to the convention stated in the bunyan documentation. During these past years I’ve seen these serializers not used in their full potential because of the misuse of the log methods.

Enabling these serializers is very easy, and you might have already seen how in the code snippets above. Let’s begin with the following example:

The convention is simple; the first argument of the logging method has to be an object literal containing one or more of the following properties:

err for errors; it will print the full stack trace of the error, its default properties (like name and message) and any other property in case you are extending the native Error object with any custom fields.
req for HTTP requests; this will print the following request information: method, URL, headers, remote address and remote port.
res for HTTP responses: status code and headers.

Be wary of potentially sensitive information being logged as part of the headers in req and res— think of tokens, passwords, personally identifiable information and such.

The output couldn’t be more beautiful!

“player” and “spell” are pretty-printed for your troubleshooting leisure

Child loggers for application sub-components

Child loggers are specialized loggers for sub-components of your application. They can be created with log.child(...), and they can receive extra properties that will be present in all the log lines created by the child logger. They can be useful, for example, when attached to incoming requests in your REST API — which is a feature that comes out of the box in modules like restify.

Consider the following code:

someMiddleware is a connect-compliant middleware that attaches a child logger to the incoming request, which could be used further down in the chain of middlewares and request handlers, so that all the log lines generated during the processing of this request have the property req_id. This will help you trace anything that took place during the request.

If you are using restify, this is an out-of-the-box feature which can be enabled with the following code:

server.use(restify.requestLogger())

With this, restify will add a child logger in req.log with a req_id field, exactly like we manually did in the code block above.

Filtering logs with the bunyan CLI

The bunyan command-line interface not only pretty-prints your logs; it also has an option to filter them based on conditions, which can sometimes be handy while digging through tens of thousands of log lines:

> cat server.logs | bunyan -c 'this.req_id = "abc-def-ghi"'

The code above will only display log lines that contain the property req_id with value abc-def-ghi. The condition passed to -c must be valid JavaScript, and this holds the log line.

You can also filter logs to just display lines with a certain level:

> cat server.logs | bunyan -l error

Removing sensitive information from your logs

Sometimes, when an error is encountered in your application, you might want to log as much information as possible to troubleshoot its route cause. Let’s assume you have a REST API that accepts a POST request, and for troubleshooting purposes you are catching errors on the API endpoint handler function and logging all available information: request body, headers, etc. This can be a double-edged sword, since you might be accidentally be logging sensitive information from the incoming request body, like passwords, tokens, email addresses, etc.

If you don’t want to cherry pick which body properties should be logged, you can build a bunyan serializer to mask a certain set of blacklisted properties:

This is a simple example that only masks root-level properties in the values of body and form.

Both “email” and “password” will be masked

To mask deeper properties you might want to consider functionalities like lodash’s _.get (docs) and _.set (docs).

Make sure to double check the requirements for serializer functions! It’s especially important to stress that serializers should never mutate the given object.

Log levels

What things should be logged in each of the available levels? Bunyan’s author proposes the following approach, which I’ve been using. The descriptions are quoted from the bunyan documentation, with some remarks added in italics:

fatal: The service/app is going to stop or become unusable now. An operator should definitely look into this soon.
error: Fatal for a particular request, but the service/app continues servicing other requests. An operator should look at this soon(ish). Include as much information as possible to assist in a postmortem analysis of the error, be mindful with sensitive data.
warn: A note on something that should probably be looked at by an operator eventually. Data inconsistencies, responses from 3rd party services that do not comply with their API specifications.
info: Detail on regular operation.
debug: Anything else, i.e. too verbose to be included in “info” level.
trace: Logging from external libraries used by your app or very detailed application logging.

What to log in case of an error?

My rule of thumb is:

Log all the information you require to be able to consistently reproduce the issue.

This becomes crucial when you have a component that interacts with 3rd party services and the error is happening on their side; you will need to provide them concise steps to reproduce the problem in order for it to be solved as quick as possible.

Assume we have a REST API that sends some analytics to a third party service. For whatever reason, this service is now failing; maybe it’s in an unhealthy state, or our authentication tokens are expired, or they changed something in their API specifications without letting us know. In our example, this failure won’t stop the flow of our endpoint, but we need to be able to address it with the developers of the analytics service:

In this example we are logging all the information that would allow us to manually reproduce the call to the analytics service, so that we can verify whether the issue is still occurring and, if so, file a bug report that includes steps to reproduce the issue. And just to keep things safe, we will also be masking any sensitive piece of information that might show up in the incoming request body and headers.

I’d be delighted to hear your opinions and experiences with troubleshooting-drive logging! Good luck out there and may PagerDuty let you sleep long and calm nights!