Log like a Boss
Logs are gold mines of information, with uses in debugging, growth-hacking, behaviour analysis, auditing, fraud-/intrusion-detection and much more. It’s likely that every single department in an organisation can benefit from information extracted from a single log:
- Finance want to know how many active users there are to calculate cost of acquisition;
- Marketing want to know which countries have the highest growth;
- Developers want to back-track through a series of data to debug an issue.
- Systems want to ensure that a component is working as expected.
I've spent a good deal of time of the past few years dealing with logs of all shapes and sizes, so here’s my short list of dos and don’ts.
- Try to make your log lines parsable so they can be read by a script, but also make them legible for humans. Use a simple key-value format or json and avoid logging binary or encoded data.
- Try to stick to one log line per event. The chances are, at some point you will use standard UNIX tools to process your logs, which naturally process files line-by-line.
- Document your log formats. Will you remember the meaning of each column in 6 months time? It’s likely that someone else will end up processing your logs, so save yourselves the effort. As your software evolves it’s likely that your log will change, so keep a record of your changes as you go.
- Give each log a unique name. I can’t stress this point enough. If you collect logs it’s likely that they will end up mixed with logs from different systems. If two completely different logs have the same name, how will you tell them apart!
- Don’t worry about including too much information in a log event: chances are you’re not logging enough anyway. Use log levels (info, debug, critical) to control how much you log.
- Likewise, more is sometimes less. De-normalising logs can be a pain in the bum. A log file is not a database, so you don’t have to normalise the data in it. If your log event is generated by a user, you might be tempted to just log the user’s ID; but if you have their email, language, country-code and account-number available why not log that too? It could save you a lot of work when you need to trawl through all the events and cross-reference it with a database.
- Timestamp the right way. The timestamp should be the first thing on each log line. Start each log event with a timezone-aware, standardized timestamp and use the same format in all your logs. I use ISO 8601 combined date & time with timezone designator, like this:
“2014-08-04T13:28:09Z”. This means that I can extract a variety of different logs and easily grep then sort by time.
Take it from me: don’t use rfc2822 date format.
- Don’t roll your own logger. Use a language- or framework-specific logging library.
- Don’t log credit card numbers! I don’t believe you’d actually do this, but if so you should really learn about PCI DSS.
- If you are logging personal data — emails, address, social security number, names, birth dates — treat it with the required respect and security.
- Rotate your logs at least every day. Incidentally, the UNIX tool logrotate has a rather idiosyncratic way of renaming files each day like this:
x.log → x.log.1 → x.log.2.gz → x.log.3.gz → etc.
This default method can be quite problematic if you use rsync, so personally I prefer to set up logrotate to include the date in the rotated filename.