Indexing log files to Solr using Logstash

Sreekanth T
Aug 26, 2017 · 4 min read

Recently I got a requirement to create a log dashboard for different Hadoop applications and component. By default the tool for parsing log files was Logstash. Usually people use Logstash together with Elastic Search for storage and Kibana for visualizations. But in our case there was slight chnage. Our team has already been using Solr as the search engine, so we needed to use the same setup and instead of Kibana we had to use Banana, a fork of Kibana.

Everything was fine when started. Logstash has its own output plugin for indexing the data to Solr, named logstash-output-solr_http plugin. But it was not working, the logs were not getting indexed to Solr collection. After hours of head scratching and numerous coffees finally I figured out the issue and successfully indexed the data into Solr.

In this tutorial I will try to explain the steps I followed to index the log files to Solr collection. I have used linux machines running on Red Hat 7 for this tutorial.


Installing Logstash

Before installing Logstash make sure that you have Java installed in your machine.

sudo yum install java-1.8.0-openjdk
curl -L -O
https://artifacts.elastic.co/downloads/logstash/logstash-5.5.0.rpm
sudo rpm -i logstash-5.5.0.rpm

The below page explains Logstash installation for other platforms.

Installing Solr

The next step after installing Logstash is to install Solr. I have installed Solr in a different server but it can be installed on the same server as Logstash.

For installing Solr you have to create a user other than root user.

wget http://mirror.reverse.net/pub/apache/lucene/solr/6.6.0/solr-6.6.0.tgz
tar -xvzf solr-6.6.0.tgz

Installing solr_http plugin

Logstash doesnt support indexing to Solr out of the box. You will have to install the official Logstash plugin for Solr names logstash-output-solr_http. To install plugin logstash has its own utility. Go to the logstash installation directory and run the following command.

logstash-plugin install logstash-output-solr_http

The current version of this plugin has few bugs. We will discuss how to fix those and make this plugin work.

Installing Rsolr 1.1.2

Logstash will use a ruby library called Rsolr for connecting to Solr. Logstash will install the latest version of Rsolr, but latest Rsolr has a bub with timestamp datatype which will prevent indexing to Solr. To prevent this we will have to manually install the working version of Rsolr. Currently the latest working one is Rsolr 1.1.2.

gem install rsolr -v 1.1.2

Updating Logstash Solr_http plugin

Once Rsolr 1.1.2 is installed make a note of the path on which the gem file got installed. We have to update this path in the solr_http plugin. Open the solr_http.rb file present in the following directory.

/[logstash_installation_dir]/vendor/bundle/jruby/1.9/gems/logstash-output-solr_http-3.0.1/lib/logstash/outputs

Open this file in a text editor and comment require “rsolr” and add the following line

require “/[path/to]/gems/rsolr-1.1.2/lib/rsolr”

The default gem installation path can be obtained by the following command.

gem environment

On the 67th line of solr_http.rb file comment the .iso8601 part. Then comment the line @solr.add(documents) and add the below line to enable the documents to be committed after adding them to the collection.

@solr.add(documents, :add_attributes => {:commitWithin=>1000})

Working version of the file can be accessed from the below github repo.

Creating the Logstash config file

Now you can create the Logstash Configuration file with the details of input file directory and output details. Here we wont try to add any filter as we want to index the complete log messages.

Logz.io has a good tutorial on creating grok patterns for fetching speciific fields of a log.

Starting Solr nodes and creating a collection

Now we have completed the logstash setup and created the conf file. Next step is to start the Solr nodes and then create an empty collection to index the log messages. Make sure that the collection name mentioned in the Logstash conf file and the one you are going to create are same.

bin/solr stop -e cloud -noprompt

Now we can create a data driven collection using the below command.

bin/solr create -c collectionname -d server/solr/configsets/data_driven_schema_configs/

Starting Logstash

Its time to start our Logstash service with our custom conf file.

logstash -f logstash_solr.conf

Logstash will start and read the log files present in the input directory. Each line will be indexed to the Solr collection with a number of metadata fields.

)
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade