Logs, Lager, Logstash, and 12 Factor App

koolak82
9 min readNov 6, 2019

--

logs

Logs are an important part of any software application. Logs, I define as a single event that represent what happened with the timestamp.

There is an elaborate definition of log expanded from above. Log ideally would define:

  • What, when, and who of the Event. What is the message part of the log and is the most important of it. When is pretty obvious the timestamp.
  • Who comes from more often than not the Context.The context is other piece of information which makes the logs rich and useful.
  • The level of the log is the way to structure / categorise the logs. Probably the only bit of the categorisation we apply on logs. The logs are pretty free form otherwise. Programmers have been plagued with getting the level of logs right, whether it should be DEBUG, INFO, WARN, ERROR. And since this cut across the org. like whether Operators or Managers should be seeing the INFO logs or what? This is absolutely critical piece to get right.
  • A slightly corrected definition would also encompass multiple lines too if Error stack traces needs to be logged too. And they should be (no doubt). Though often that is achieved by still keeping the Error Id and stack trace linked with Root Cause Exception Id and language specific error formats.

Logging is one important part of Observability too. And thus important and different too from other two.

What is it good for?

As can be seen from the three pillars of Observability, the logs are closest to event (they are not Event though, and Event is certainly a loaded term, depends lot on context). In more concrete words, logs (should) represent whatever important happening in the lifecycle of the application / software.

  • They contain a lot of information. Certainly more information than Metrics or Traces.
  • They represent a single occurrence of something in application. So a user can log in, add an item to cart, favourite an item, make a purchase etc so many times. And a single log is one single login of the a specific user (there would be many certainly) through a certain device (hell lot of information) at one specific timestamp. Whether that login was an unusual failure could be (or could not be) an ERROR.
  • The more the information the more useful the log is. There is a trade-off, more doesn’t mean infinite irrelevant useless information. But the idea is to be able to capture as much as information possible.
  • Also because of the above only, you can / should be able to search anything on logs. Serach is an important use case of logs. What use the logs is if it’s not searchable?
  • And thus you can sacrifice some other things in logs. Probably a structural analytics on logs can be sacrificed then.

logs thus could be useful for many things

  • For actual debugging of a specific occurrence naturally.
  • Error occurrences in total. And probably finding out some overall broken feature / code path in the system. May be depending on some third party system.
  • Surprisingly also for normal well being / health of the system too. INFO has big use case here.
  • Compliance, Security, Code Optimisations (see third party library case above) etc.

Lager

Lager is described as logging framework for Erlang/OTP. It is even recommended as good app adhering to OTP principles and sound example of gen_event behaviour.

For the purpose of discussion though, lager provides feature rich logging app for Erlang/OTP from simple console/shell logging. No doubt, no production app can survive on console / shell logs and lager (or similar frameworks in Erlang/OTP and other languages) is de facto. A glance back to first part of the article above will refresh the logging requirements are pretty massive too. Logs deceptively sound simple in the beginning in any application but soon grow into a big features set. Lager takes care of all this. To name a few

  • It allows all the level.
  • It allows many backends, not just console or file.
  • It allow nice optimisation to log only when some level is applicable.
  • Context information, like module, function, arity, pid of Erlang process in the application.
  • When we speak of high volume of logs and file as backend, then lager support time, size, or external tool based rotation too. This becomes important very quickly, one would realise.
  • And it supports numerous sinks, one can write new sink too, custom dropping or non dropping of log events, dropping (load shedding) a misbehaving sink etc.

Using lager is as simple as adding

-compile([{parse_transform, lager_transform}])

And then doing

lager:start()

After that it’s as simple as using logs like

lager:error(“some error”).lager:info(“batch successful”).

This is very simplistic and defeat the whole purpose this article. Cause there has to be timestamp, level, application, pid, module, context and then bookkeeping which is clean up, rotate the logs etc. Where is all that? Let’s start with the a better realistic log the above line will generate and that should clear it.

2019 Nov 06 07:00:30 ip_address erlang: info | getTransportOrders:{ok,{{“HTTP/1.1”,200,”OK”},[{“date”,”Wed, 06 Nov 2019 07:00:30 GMT”},{“server”,”Jetty(9.4.12.v20180830)”},{“content-length”,”67"},{“content-type”,”application/json;charset=utf-8"},{“access-control-allow-origin”,”*”}],”{\n \”pageNo\” : 1,\n \”pageSize\” : 0,\n \”page\” : [ ],\n \”total\” : 0\n}”}}

And that is more like it. The concepts involved here are:

  • Formatter
  • Handler
  • Sink

The example is more complicated than even required cause it actually is logstash format. And we will discuss that in next section of the article but at the high level the above three things still stand.

  • Formatter: is required by lager and there is default one even if we don’t customise or change it. The default is lager_default_formatter. Which could be made as exotic as possible like.
{lager, [
{handlers, [
{lager_console_backend, [{level, info}, {formatter, lager_default_formatter},
{formatter_config, [time," [",severity,"] ", message, "\n"]}]},
{lager_file_backend, [{file, "error.log"}, {level, error}, {formatter, lager_default_formatter},
{formatter_config, [date, " ", time," [",severity,"] ",pid, " ", message, "\n"]}]},
{lager_file_backend, [{file, "console.log"}, {level, info}]}
]}
]}.
  • Handler: The second part is also highlighted in the code above. Handler is what can be linked to the consumer side. And a single log going through a single sink (more on it just next) can have more than one handler; like the two console, and file above.
  • Sink: Last, technically the crux of lager. And historically, lager has one sink only so this abstraction didn’t even matter. But now lager supports multiple sinks too.

lager is state of art gen_event pattern.

The above one I would say are the minimum basic high level abstractions of lager. There are advanced feature and abstraction then. Like watermark for load shedding and how it works. README.md of erlang/lager is the best and exhaustive source and I can’t cover all here.

Logstash

lager is cool and similar frameworks exist for other languages too; logback for Java comes to mind. It can evolve from simply seeing the logs on developer screen, to a file, dumping to rabbitmq, or dumping to logstash. There is another dimension to the problem which help introducing logstash. Logs are in use since there have been computers. Not just applications, Java, or Erlang/OTP. OS have logs too. And they come in as many flavours as there are computer systems. And naturally one’s application interacts with all these computer systems and thus give a solution to the customer. Just to bring point home further, there are process logs on OS, nginx logs on top of your application acting as reverse proxy, your application logs, you r application database opslogs and message queue logs. Each of these would be important one point of time or other. And each of them are custom format, and what not. logstash takes this problem and try to solve it in it’s opinionated way.

A simple pipeline of logstash would look like:

input {
beats {
port => "5044"
}
}
# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
# }
output {
stdout { codec => rubydebug }
}

The point is that there could be 100s of inputs, filters, or outputs. So if you are using nginx or tomcat like I said above that we will have in inputs. And the filter bit is as important as input. Because of so much variety of logs, any solution would fall on regex and gorking. And that’s where the real value of community tested and expertly created filter would come out.

logstash is gorking on steriods.

There is another point to discuss: logstash is / can be used at two sides. At the agent side as well as centralized server side. What I have described so far (or was thinking in mind) was kind of generic of two. So a logstash agent can push to logstash centralised collector. The library is same. The difference only comes from the consumer point of view, that how it’s being used and some tuning params.

The discussion would not be complete without mentioning an alternative of logstash which is fluentd.

The description and features would look quite close and matching if one go in deep (and why won’t). There are some differences of course which could be architectural (centralised vs distributed), ease of use (more opinionated vs loosely defined and library approach), and performance (stacks used, real numbers etc.) etc. but that would be some other discussion.

12 Factor App

So, now we are almost done. We have got the full pipeline covered. Just to close it, we would have got the ElasticSearch to store as well as have logs indexed for text search in EalsticSearch. Remember the most important functional requirement of logs. And we would have Kibana (visualisation) on top of it too.

This is a bonus section sort of. 12 Factor App was made popular by Heroku and 11th principle of it pertains to / talk about logs. I give you the link first

And then my interpretation of the same. What I have described is closer to the XI tenant of 12 factor app. I have largely described how application may not concern about storing and managing it. And just present is event (that word again) stream. There are pros and cons of this and alternate approach. And

  • I have stored logs intermediate too.
  • I have done more work upfront in application and at logstash agent side and lesser and lesser on logstash collector side too. This is another dimension, there could be alternate design trade off here too.

And like a good architect, open to hear both sides.

Quite naturally, Heroku has logplex then. That could be for some other day. But hopefully it is clear with the official documentation above and people with experience to the same. It’s a sort of logstash to me but across Heroku platform (still inside it only end of the day!). Heroku still covers bigger area than a single app though, since it’s a PaaS. I have not used Heroku, so could be wrong there.

Finally winding up. A long post. Phew! The logs are important and one area I have worked on and off in various company. The end summary is logs are important, and gives you the most detailed information about the event which metrics and trace can not.

--

--

koolak82

(Sphagetti) Javascript, (Bloated) Java .. in the middle, and (frightening, mysterious) DB at the back