Structured Logging for Rails using ELK
This post describes the structured logging setup used at Marketplacer to publish information about each request to our Rails application to our ELK stack.
Our primary goal for our ELK platform was forensics & long term usage analysis of back-office functionality. For example, we wanted to know:
- has this route or controller been used at all in the last six months?
- how did this advert’s price get set to zero, and by who?
- we’re seeing a sudden uptick in transactions per second every day at 1 AM — is it one of our API users, or is it a bot trying to scrape us?
We needed to store everything we could about every request (including POST/PATCH parameters) and we needed to keep it for six months or longer. Storing only a fraction of the requests (as in an instrumentation platform like Google Analytics) or not storing the parameters of the request (as in traditional text log files) would not meet our goals.
We did not need to store detailed error information or debug output; we use Rollbar for that.
Our solution was to write a log subscriber that would sit at the same level as the normal Rails request logger and write to its own log file to be ingested by our ELK stack. This log file would not replace or combine with the usual production.log
or unicorn.stderr.log
produced by our application. filebeat (part of the ELK stack) would tail the file and send its contents to ur Elasticsearch logging server.
The solution is described below from bottom-up:
- how we obtain events & emit log entries in Rails
- how those events are sent to Elasticsearch via Filebeat
- how Elasticsearch indexes those events via ingest pipelines
- how we configure Kibana to display those events
Some of the code & Elasticsearch configuration used in the below has been made open source and is available in this gist.
Structured Output Format
Our log subscriber outputs one line of JSON per request. We use JSON as it can then be directly ingested in to Elasticsearch with little to no modification. This makes it much more reliable and flexible than a more traditional human readable log file. A reformatted example follows:
The only attribute outside of the rails
object is @timestamp
, which is used by the ELK stack to order events. We provider our own timestamp as we don't want filebeat to use the timestamp at which the log file was read, but the time at which the event occurred, as they may not be exactly the same.
Inside the rails
object:
params
contains all parameters that we can safely record or index, including parameters supplied as part of a PATCH/POST/PUT requestaction
andcontroller
are duplicated fromroute
to make it easier to search for them specifically within ELKsession_id
is recorded partly so we can attempt to replay a user's actions from within ELKoriginal_path
includes any query string parameters: this is primarily useful for discovering parameter decoding edge cases in our application or Rails itself
Structured Output Code
The code used to generate this output is split in to two main sections: the subscriber itself, and a routine to flatten the parameters.
Flattening Parameters
Parameters in Rails can be represented internally as objects containing objects, eg:
While we can insert this information directly into Elasticsearch, what we can’t then do is insert anything which contains the same key names but a different schema. For example, we can’t insert the above object and then insert another where advert
is a single string instead of an object.
We work around this by flattening the parameters into a single object with many keys, eg:
The flattener is also responsible for:
- trapping instances of
Rack::Test::UploadedFile
orRack::Multipart::UploadedFile
and representing them only as[FILE UPLOAD #{filename}]
- truncating any string longer than 120 characters
Log Subscriber
The log subscriber attaches itself to the ActionController
on the process_action
and redirect_to
methods. process_action
handles publishing the log event to ELK, whereas redirect_to
sets a thread local variable for process_action
to later pick up and register as a redirect.
process_action
does the following:
- runs the request’s parameters through the flattener
- attaches exception information, if present
- attaches relevant headers (eg
HTTP_HOST
,HTTP_USER_AGENT
, etc) - obtains a user ID and/or administrator ID from the session, if present
- attaches overall, database and view runtimes
- attaches
route
as a combination ofcontroller
andaction
to make searching easier in ELK - adds a
@timestamp
- encodes the result as JSON and emits it to the log
Each line is written to log/elk.json.log
and is picked up by filebeat running on the production system to be sent to our ELK stack.
Filebeat
Filebeat is part of the “beats” suite of software, published by Elasticsearch as part of their ELK stack. Filebeat reads logs files, applies some basic filtering, and sends the contents on to Elasticsearch.
It can be used as a lightweight replacement for a heavy logstash instance. It is an entire replacement for logstash-forwarder.
Each production server logs ELK events to its local filesystem, and thus one filebeat instance runs on each production server.
Our filebeat configuration contains entries for other things we ingest. I’ve replicated only the portions relevant to our Rails logging below:
Note the following:
- we tell Elasticsearch to use the ingest pipeline named
rails
on each event - the events from
elk.json.log
are taggedrails
, which means they're then sent to thelogs-rails-*
indices json.keys_under_root
means we treat each log entry as the entire event, rather than embedding its contents in ajson
object when we send it to Elasticsearch
Elasticsearch
Apart from regular Elasticsearch configuration, there are two main things we had to consider when setting up ES to handle Rails log data:
- Ingest Pipelines
- Index Templates
Ingest pipelines transform the data before it is indexed. They can be used as a replacement for logstash transforms, and they even have the capability to understand logstash “grok” rules.
Index templates describe defaults for how new indexes are created, and both the format and limitations of the data to be added to the index. Elasticsearch can guess this based on the data, but it’s much safer to specify it ahead of time.
Ingest Pipeline
We created a simple ingest pipeline named rails
. It uses the ingest-geoip
and ingest-useragent
plugins to find a location for the request's IP address and decode the HTTP User Agent respectively.
Index Template
Our index template for indices matching logs-*
is based heavily on the defaults that filebeat provides for its indices, with some modifications:
- we limit the index to one shard and zero replicas, to save space on our single machine Elasticsearch setup
- we copy the
geoip
anduser_agent
mapping definitions from thenginx-access
default mappings into ourrails
section - we specify dynamic mapping rules for our rails parameters:
Kibana
Once the data is in Elasticsearch we can use Kibana as a dashboard to analyze it. We found it useful to set up Kibana before setting up filebeat: its development tools allow for easy interaction with the Elasticsearch API.
Most of our interactions with Kibana are via the “Discover” tab. We use filters to identify long-term usage of site functionality or discover the series of events that led to an error. Right-clicking on a particular field also allows us to show a visualisation on that field: eg a breakdown of user agents or countries.
One big caveat with Kibana is that it can suffer when there are lots of different fields in your indexes, which is a problem we have. It’s occasionally unfortunately necessary to hard-restart Kibana as it enters a 100% CPU locked state.
Now that we have Kibana we can finally answer some of the questions raised above:
To find out if code is being used, we first add a filter against rails.route
or rails.controller
and then set the time range at the top of the page to one month.
To discover what happened to our bad advert, we can first try filtering rails.params.id
for its ID and rails.method
to limit results to POST
requests. We then set the time range to when we suspect the incident occurred. If we find the relevant request, we can finally filter on rails.session_id
to work out how they got to that point in our application.
To work out the origin of the extra transactions, we first set the time range to when those transactions are occurring. We can then create visualisations based on source IP address, destination controller, and GeoIP location to determine the nature of the traffic.
Originally published at mipearson.id.au.