Indexing log files to Solr using Logstash

Recently I got a requirement to create a log dashboard for different Hadoop applications and component. By default the tool for parsing log files was Logstash. Usually people use Logstash together with Elastic Search for storage and Kibana for visualizations. But in our case there was slight chnage. Our team has already been using Solr as the search engine, so we needed to use the same setup and instead of Kibana we had to use Banana, a fork of Kibana.
Everything was fine when started. Logstash has its own output plugin for indexing the data to Solr, named logstash-output-solr_http plugin. But it was not working, the logs were not getting indexed to Solr collection. After hours of head scratching and numerous coffees finally I figured out the issue and successfully indexed the data into Solr.
In this tutorial I will try to explain the steps I followed to index the log files to Solr collection. I have used linux machines running on Red Hat 7 for this tutorial.
Installing Logstash
Before installing Logstash make sure that you have Java installed in your machine.
sudo yum install java-1.8.0-openjdk
curl -L -O https://artifacts.elastic.co/downloads/logstash/logstash-5.5.0.rpm
sudo rpm -i logstash-5.5.0.rpm
The below page explains Logstash installation for other platforms.
Installing Solr
The next step after installing Logstash is to install Solr. I have installed Solr in a different server but it can be installed on the same server as Logstash.
For installing Solr you have to create a user other than root user.
wget http://mirror.reverse.net/pub/apache/lucene/solr/6.6.0/solr-6.6.0.tgz
tar -xvzf solr-6.6.0.tgz
Installing solr_http plugin
Logstash doesnt support indexing to Solr out of the box. You will have to install the official Logstash plugin for Solr names logstash-output-solr_http. To install plugin logstash has its own utility. Go to the logstash installation directory and run the following command.
logstash-plugin install logstash-output-solr_http
The current version of this plugin has few bugs. We will discuss how to fix those and make this plugin work.
Installing Rsolr 1.1.2
Logstash will use a ruby library called Rsolr for connecting to Solr. Logstash will install the latest version of Rsolr, but latest Rsolr has a bub with timestamp datatype which will prevent indexing to Solr. To prevent this we will have to manually install the working version of Rsolr. Currently the latest working one is Rsolr 1.1.2.
gem install rsolr -v 1.1.2
Updating Logstash Solr_http plugin
Once Rsolr 1.1.2 is installed make a note of the path on which the gem file got installed. We have to update this path in the solr_http plugin. Open the solr_http.rb file present in the following directory.
/[logstash_installation_dir]/vendor/bundle/jruby/1.9/gems/logstash-output-solr_http-3.0.1/lib/logstash/outputs
Open this file in a text editor and comment require “rsolr” and add the following line
require “/[path/to]/gems/rsolr-1.1.2/lib/rsolr”
The default gem installation path can be obtained by the following command.
gem environment
On the 67th line of solr_http.rb file comment the .iso8601 part. Then comment the line @solr.add(documents) and add the below line to enable the documents to be committed after adding them to the collection.
@solr.add(documents, :add_attributes => {:commitWithin=>1000})
Working version of the file can be accessed from the below github repo.
Creating the Logstash config file
Now you can create the Logstash Configuration file with the details of input file directory and output details. Here we wont try to add any filter as we want to index the complete log messages.
Logz.io has a good tutorial on creating grok patterns for fetching speciific fields of a log.
Starting Solr nodes and creating a collection
Now we have completed the logstash setup and created the conf file. Next step is to start the Solr nodes and then create an empty collection to index the log messages. Make sure that the collection name mentioned in the Logstash conf file and the one you are going to create are same.
bin/solr stop -e cloud -noprompt
Now we can create a data driven collection using the below command.
bin/solr create -c collectionname -d server/solr/configsets/data_driven_schema_configs/
Starting Logstash
Its time to start our Logstash service with our custom conf file.
logstash -f logstash_solr.conf
Logstash will start and read the log files present in the input directory. Each line will be indexed to the Solr collection with a number of metadata fields.
