Structured Logging — Part 2: Usage
Originally published at blog.current.sh on September 23, 2015 by @evanphx.
This is Part 2 of a series; start with Part 1.
You want to get into using structured logging so you can finally start to make use of all that data. Now you have to figure out what to actually log and how to do it!
How to Log
We’ll just go straight to the elephant in the room: JSON. Most structured logging systems these days use JSON, and for good reason. It’s lightweight enough to stay out of the way and enforces just enough structure to make the data usable.
Thus, my recommendation is to start with JSON. You want to get to the point where you end up growing out of JSON, rather than start with a new, super awesome custom solution right out of the gate.
Why not build your own format? Two reasons: one, it’s bound to just slow you down. Secondly, you might want to be able to get the data from those logs back in a few years. With a custom format, getting that data back is much less likely.
You should only build your own format if — and only if — you have seen JSON have significant issues for you and your use case.
“But Evan, can I just use a simple key=value format?”
While certainly tempting, my little squirrel, what you’ll effectively end up doing is writing a JSON parser to deal with the cases related to quoting, arrays, nesting, etc.
If you must use a key=value format, be ready to switch to JSON at the first sign of friction. Don’t spend a bunch of time shoring up your own format; it’s not worth your time.
In fact, if you’re using a key=value format, I HIGHLY suggest you convert it to JSON before storing and sending it off. That conversion process is easy and it means you get uniformity further down in your stack.
What to Log
The answer is always subjective and depends on a number of other factors, but there are some common elements that you should try to follow.
The Subject
Every message should include something that indicates its subject. The easiest way to do this is to use a key like event. It’s generic enough to easily encapsulate the full spectrum of values without being so generic as to be meaningless.
The real importance is in the values here. A common (and imho best) practice is to use simple identifying strings like user_add or request_sent. These are both easy for a machine to handle and not cumbersome on the humans that also see them.
Additionally, use a simple scheme. The above values conform to a simple <noun>_<verb> scheme. The scheme exists to help all the humans name their identifiers in the same general way, which reduces confusion.
Tense
Something to consider with an identifier-naming scheme is the tense of the verb. For instance, if the action just happened, should it be user_added? And if the event is emitted before an action is taking, should it be adding_user?. These problems are not specific to English, but the problem can be even worse in other languages.
Generally, the answer is to make the identifiers a little clunkier in terms of the written human language, in favor of also making them more uniform. For instance, generally events are emitted after an action takes place, so stick with the simple user_add in this case. Don’t clutter the identifier with the tense in this case.
If you have a place where it’s important to note that the log is being emitted before an action (typically this is also paired with a message that is finished), then go with user_add_start and user_add_end if you have the paired end message. This keeps the identifier with the same prefix as the common case, which makes them easier to search for later on.
Human Messages
Some people prefer using a human message instead, such as “User added”. I don’t recommend this because it makes scheme enforcement much harder; but if you must, use a short message. The longer the message, the harder it will be to search and categorize the message later.
The Context
One reason that log messages are so difficult to use well is that typically they lack context. We’ve all seen a message like Error in database query only to ask immediately: “But where?!”
For this reason, adding context to a message is extremely important. Many logging libraries have the ability to add the file and line in the code that emitted the log. This is a great place to start but it leaves a lot of information out.
First and foremost, the context should include session identifiers. These are invaluable for being able to piece together what happened along the way to a log being emitted. Here is a quick example:
{ “event”: “user_add_to_cart”, “tags”: { “host”: “web1”, “user_id”: 31247 } “item_id”: 99124, “quantity”: 2, }
Here we’ve used a tags convention to include context about the event. We now quickly know which user and which machine were performing the action. This tags convention is quite powerful because it clearly separates the context from the meat of the event and allows for layering in more context as it’s available.
Some examples of other context data are:
- Server that the event occurred on: web1
- User that performed the operation
- Component emitting the log
- Request/Session identifiers
The Body
We now move into the meat of your event: the body. Clearly, this will vary wildly between events… and that’s the point. It’s just a bag of attributes that make sense in context, which is another reason why presenting that context well is so important.
Even though they’re fairly freeform, you should try to adhere to some rules to make using the data easier.
1. One kind of value per key (monomorphic types)
For example, if the key user always contains a numeric identifier for that user, then only use user in that way; don’t emit user with an email address in a different app. This will only make using the data later more difficult. In that example, you might want to find everything that a user did by searching for user=3342 — if some events contain that email address, they won’t be shown.
2. Use the same key for the same meaning
Consider these 2 events:
{ “recv_ip”: “10.3.17.2” }
and
{ “recv_ip_addr”: “10.3.17.2” }
It’s safe to assume that the 2 events are trying to convey that something happened with an ip address. Don’t use 2 different keys to do the same thing.
Generally, try to use as few keys as you can. The smaller the key space, the easier the data is to work with. One very easy way to manage this is to establish a “field dictionary.” This dictionary just provides a list of keys in use and their kinds of values. Treat it like a code style guide — something that the team is involved in and that grows with the team.
In fact, I’ve got one here to help you start out: Vektra Field Dictonary
Fork that and modify it as your team grows.
Another quick rule to apply is using key suffixes to help add understanding to the value. For instance, have the team use _id on the end of keys where the value is a numeric identifier for a database value. This can keep your key space clean and regular.
Sidenote: Something else to think about is how the naming of the values influences understanding. So if the field in the code snippet above had just been named ip, would the user of that event be confused about how that value is related to the event, since the key is really just naming the type of value? I don’t yet have any good advice on resolving this, but it’s definitely something to think about.
- Use JSON until you have a good reason not to
- Every event should have a subject
- Give your events context — the more the better
- Keep your keys uniform across usages: one key => one value type
- Establish a field dictionary and grow it
Next time, in Part 3, we’ll discuss time!
— 😘