Structured Logging for Rails using ELK

This post describes the structured logging setup used at Marketplacer to publish information about each request to our Rails application to our ELK stack.

Our primary goal for our ELK platform was forensics & long term usage analysis of back-office functionality. For example, we wanted to know:

  • has this route or controller been used at all in the last six months?
  • how did this advert’s price get set to zero, and by who?
  • we’re seeing a sudden uptick in transactions per second every day at 1 AM — is it one of our API users, or is it a bot trying to scrape us?

We needed to store everything we could about every request (including POST/PATCH parameters) and we needed to keep it for six months or longer. Storing only a fraction of the requests (as in an instrumentation platform like Google Analytics) or not storing the parameters of the request (as in traditional text log files) would not meet our goals.

We did not need to store detailed error information or debug output; we use Rollbar for that.

Our solution was to write a log subscriber that would sit at the same level as the normal Rails request logger and write to its own log file to be ingested by our ELK stack. This log file would not replace or combine with the usual production.log or unicorn.stderr.log produced by our application. filebeat (part of the ELK stack) would tail the file and send its contents to ur Elasticsearch logging server.

The solution is described below from bottom-up:

  • how we obtain events & emit log entries in Rails
  • how those events are sent to Elasticsearch via Filebeat
  • how Elasticsearch indexes those events via ingest pipelines
  • how we configure Kibana to display those events

Some of the code & Elasticsearch configuration used in the below has been made open source and is available in this gist.

Structured Output Format

The only attribute outside of the rails object is @timestamp, which is used by the ELK stack to order events. We provider our own timestamp as we don't want filebeat to use the timestamp at which the log file was read, but the time at which the event occurred, as they may not be exactly the same.

Inside the rails object:

  • params contains all parameters that we can safely record or index, including parameters supplied as part of a PATCH/POST/PUT request
  • action and controller are duplicated from route to make it easier to search for them specifically within ELK
  • session_id is recorded partly so we can attempt to replay a user's actions from within ELK
  • original_path includes any query string parameters: this is primarily useful for discovering parameter decoding edge cases in our application or Rails itself

Structured Output Code

Flattening Parameters

While we can insert this information directly into Elasticsearch, what we can’t then do is insert anything which contains the same key names but a different schema. For example, we can’t insert the above object and then insert another where advert is a single string instead of an object.

We work around this by flattening the parameters into a single object with many keys, eg:

The flattener is also responsible for:

  • trapping instances of Rack::Test::UploadedFile or Rack::Multipart::UploadedFile and representing them only as [FILE UPLOAD #{filename}]
  • truncating any string longer than 120 characters

Log Subscriber

process_action does the following:

  • runs the request’s parameters through the flattener
  • attaches exception information, if present
  • attaches relevant headers (eg HTTP_HOST, HTTP_USER_AGENT, etc)
  • obtains a user ID and/or administrator ID from the session, if present
  • attaches overall, database and view runtimes
  • attaches route as a combination of controller and action to make searching easier in ELK
  • adds a @timestamp
  • encodes the result as JSON and emits it to the log

Each line is written to log/elk.json.log and is picked up by filebeat running on the production system to be sent to our ELK stack.

Filebeat

It can be used as a lightweight replacement for a heavy logstash instance. It is an entire replacement for logstash-forwarder.

Each production server logs ELK events to its local filesystem, and thus one filebeat instance runs on each production server.

Our filebeat configuration contains entries for other things we ingest. I’ve replicated only the portions relevant to our Rails logging below:

Note the following:

  • we tell Elasticsearch to use the ingest pipeline named rails on each event
  • the events from elk.json.log are tagged rails, which means they're then sent to the logs-rails-* indices
  • json.keys_under_root means we treat each log entry as the entire event, rather than embedding its contents in a json object when we send it to Elasticsearch

Elasticsearch

  • Ingest Pipelines
  • Index Templates

Ingest pipelines transform the data before it is indexed. They can be used as a replacement for logstash transforms, and they even have the capability to understand logstash “grok” rules.

Index templates describe defaults for how new indexes are created, and both the format and limitations of the data to be added to the index. Elasticsearch can guess this based on the data, but it’s much safer to specify it ahead of time.

Ingest Pipeline

Index Template

  • we limit the index to one shard and zero replicas, to save space on our single machine Elasticsearch setup
  • we copy the geoip and user_agent mapping definitions from the nginx-access default mappings into our rails section
  • we specify dynamic mapping rules for our rails parameters:

Kibana

Most of our interactions with Kibana are via the “Discover” tab. We use filters to identify long-term usage of site functionality or discover the series of events that led to an error. Right-clicking on a particular field also allows us to show a visualisation on that field: eg a breakdown of user agents or countries.

One big caveat with Kibana is that it can suffer when there are lots of different fields in your indexes, which is a problem we have. It’s occasionally unfortunately necessary to hard-restart Kibana as it enters a 100% CPU locked state.

Now that we have Kibana we can finally answer some of the questions raised above:

To find out if code is being used, we first add a filter against rails.route or rails.controller and then set the time range at the top of the page to one month.

To discover what happened to our bad advert, we can first try filtering rails.params.id for its ID and rails.method to limit results to POST requests. We then set the time range to when we suspect the incident occurred. If we find the relevant request, we can finally filter on rails.session_id to work out how they got to that point in our application.

To work out the origin of the extra transactions, we first set the time range to when those transactions are occurring. We can then create visualisations based on source IP address, destination controller, and GeoIP location to determine the nature of the traffic.

Originally published at mipearson.id.au.

String Concatenator

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store