Introduction

As a fan of metrics and of acquiring as much useful information as possible my go-to tool is Statsd and Graphite but when I need to drill down for more information either to provide business information or just to track down a bug I need to get quick access to all the logs related with a specific query. I need these logs to be parsed, centralised, filtered and ready to go.

This tutorial explains how to get from a bare bones Vagrant image into something useful. Ideally I would do this through proper automated provisioning with Chef, Puppet or CFEngine but as a learning experience and a way to play around with configuration I’ll go through the steps one by one.

Vagrant

Vagrant is a quick way to make a portable separate environment in which to test new tools and applications — I used the Ubuntu Precise 32bit image to make this example.

vagrant init precise http://files.vagrantup.com/precise32.box

The only configuration requirements for your Vagrantfile needed to complete this are:

config.vm.forward_port 80, 8080 #for Kibana3
config.vm.forward_port 5601, 5601 #for Kibana2, if you prefer
config.vm.forward_port 9200, 9200 #for ElasticSearch Master

After that just run and SSH into the new image

vagrant up
vagrant ssh

I’d also suggest adding a few tools

sudo apt-get update
sudo apt-get install vim zip

Kibana 2 — you can skip this and go to Kibana 3

Once you’re in you can get a copy of Kibana 2 and install it as well as all tools required for it.

sudo apt-get install build-essential g++
wget https://github.com/rashidkpc/Kibana/archive/v0.2.0.zip
unzip v0.2.0.zip
cd Kibana-0.2.0
sudo gem install bundler
bundle install

The only configuration you’ll need at KibanaConfig.rb is to change the address Kibana listens on so it works with the port forwarding.

KibanaHost = ‘0.0.0.0'

You can run Kibana as any other ruby application. It will use the default port 5601 which is already forwarded.

ruby kibana.rb

Kibana 3

Kibana 3 only needs a web server to run so the quick and dirty (don’t do this even near Production!) way to get it up would be to install one and change the default website to show it.

sudo apt-get install nginx
wget https://github.com/elasticsearch/kibana/archive/master.zip
unzip master.zip
sudo cp -R ~/kibana-master /usr/share/nginx/kibana-master

On /etc/nginx/sites/available/default just move the default website root to Kibana

root /usr/share/nginx/kibana-master;

And reload the web server to apply the changes.

sudo service nginx restart

I’m assuming that either Kibana installation is looking into the default ElasticSearch port 9200.

Kibana3 looking very good

Elastic Search

Elastic Search will allow to query all the data from the logs, you’ll need to install Java.

sudo apt-get install openjdk-7-jre-headless

As well as change the cluster name on config/elasticsearch.yml in if you have other people testing it in the same network as it comes with auto-discovery on — this will allow you to start several instances of ES simultaneously to practice different combinations of number of indexes, shards, replicas and more.

cluster.name: goncalo-elasticsearch

To manually start one instance (or a new one) just use:

./bin/elasticsearch -f

This instance will automatically set itself on the default port and as a Master node.

A helpful plugin for ES is Head which will give you a graphical interface to add, change and delete indexes, browse data and more.

You can add it to your local development environment as your vagrant image will be forwarding the ES port.

git clone git://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head
open index.html
ES Head plugin running on local box

The default set up will provide indexes with five shards and one replica.In the screenshot above I have two nodes (Blue Streak and Wizard), two indexes (logstash-2013.07.03 and logstash-2013.08.06) and each shard has one replica.

All master copies of each shard are in Blue Streak with the bold line.

Logstash

Logstash will give you a way to read, parse logs as well as push them into ES. Just get a new copy of Logstash and create a configuration file for it.

mkdir logstash
cd logstash
wget https://logstash.objects.dreamhost.com/release/logstash-1.1.13-flatjar.jar

Given a folder with logs being shipped from a CDN we can create an input filter in a new custom configuration file i.e. log.conf. More information can be seen in the Logstash tutorial

file { path => “/home/vagrant/logstash/*.log” type => “cdn” }

As well as several filters. Grok will apply a pattern to split all the fields, There is a good list of Grok patterns here and an online tool to test patterns here.

grok { type => “cdn” pattern => “%{IPORHOST:message_ip} % {IPORHOST:message_hostname} — %{HTTPDATE:message_date} \”%{WORD:message_httpmethod} %{URI:message_uri} HTTP/%{NUMBER:message_httpversion}\” %{NUMBER:message_responsecode}” }

KV will split the URL query string parameters into new fields, this will be useful for filtering logs.

kv { field_split => “&?” }

Mutate will change current fields in the pipeline, I’m using it here to remove the original unparsed message as well as other information generated by the input filter.

mutate { remove => [ “@message” ] remove => [ “@source” ] remove => [ “@source_path” ] remove => [ “@source_host” ] }

Date will parse and assign the log timestamp with the one from the CDN. By default this is the timestamp of the ingestion through the input filter.

date { match => [ “message_date”, “dd/MMM/YYYY:HH:mm:ss Z” ] }

As an ouput filter we’ll add a stdout so we can visualise what is being ingested. As well as the elasticsearch_http filter (I use this instead of the default ES one as it is version independent).

stdout { debug => true debug_format => “json”}

By default the flush_size will be 100 and the index name will generate per timestamp by as logstash-[current date]

elasticsearch_http { host => “localhost” port => “9200" }

A working copy of this filter (for my own logs) will be here

You should now be able to add or append new logging rows which will be ingested. You’ll be able to query the data on both the ES Head and on either of the Kibana interfaces.

There is a very good read about optimising index sizes and more here in the Logstash Github account.

Redis

There are a few reasons to add Redis, RabbitMQ or another way to queue entries. You’ll want to split the filtering from the ingestion in a centralised manner, add several Logstash shippers or just have a way to buffer new logs from any type of slower parsing.

There is some more information about how to do this in the Logstash advanced tutorial and in the Kibana advanced tutorial.

sudo apt-get install redis-server

Now you can split the original log.conf into two files depending if they’re reading into Redis or grabbing from the queue, parsing and sending to ES.

Depending if it’s the input file (example here) or the output file (example here) you’ll add the Redis filter in the correct place.

redis { host => “localhost” type => “cdn” data_type => “list” key => logstash” }

Now you just need to use Logstash for each config file or even add them to the same folder and run Logstash with both at the same time.

java -jar logstash-1.1.13-flatjar.jar agent -f redisconf/

You can now verify with any query that the data is still being received by ES and query the Redis CLI to check current queue length.

redis-cli

Conclusion

This group of tools work out of the box, I felt this was a good exercise to try out the latest versions of each and to have a better understanding to write configuration and automation when required.

After being through the individual tutorials I felt it lacked a unified tutorial that went through the problems I faced while going on each step.

Setting up these tools is only half the battle. The system needs to be configured and set-up in a way that the logs are filtered in a useful way as well as prepare all of these for the most performant/resilient combination possible.