Search and Intelligence — 2

Setup Solr

In the previous story, I had provided with a novice and terse introduction to Solr. The purpose of the story was to get you acquaintance with NoSQL full-text search technologies. In this part of the story I am going to elaborate more on production setup of Solr.

Let’s just download Solr and start (make sure you have appropriate version of Java installed).

$ # get solr from "http://lucene.apache.org/solr/mirrors-solr-latest-redir.html"
$ # unzip the compressed file
$ # simply run bin/solr start

It’s is a good practice to set SOLR_HOME so that you run the server where appropriate disk space and permissions can be set. The directory where configurations of cores and their data resides is defined by SOLR_HOME. It is also helpful in case you decide to upgrade solr, the data, schema and configurations can be migrated very easily. To set SOLR_HOME, we will edit the bin/solr.in.sh file. Find and edit the variable SOLR_HOME to /data/solr-home. Now create this directory with sudo sudo mkdir/data/solr-home. We need to define solr.xmlhere. This files defines the default values for Solr server to start. Start Solr server now.

$ bin/solr start
*** [WARN] *** Your open file limit is currently 4864.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
*** [WARN] *** Your Max Processes Limit is currently 709.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
Waiting up to 180 seconds to see Solr running on port 8983 [-]
Started Solr server on port 8983 (pid=13231). Happy searching!

This starts Solr server in background on port 8983 (to start on port 8080 just start solr with that port bin/solr start-p 8080). After starting the Solr server we will create a core (we can create a core earlier too, this is to emphasize that a core can be created without restarting the solr server). We can create the core by bin/solr create -c songs but it is very convenient to use Solr UI. We will discuss Solr UI in later part of this story.

In the previous story, I had shown you a diagram in which infrastructure setup of production grade Solr is shown. Let me put that image here.

Production setup of Solr

In Solr, the directory structure is as follows:

$ #let's say you have a disk mounted as "/mnt" and you downloaded and installed Solr in this disk
$ cd /mnt
$ # you might have a parent folder here, let's say "solr6"
$ cd solr6
$ pwd
/mnt/solr6
$ # you will have a folder here called "data" that has your cores
$ cd data
$ pwd
/mnt/solr6/data
$ # all your cores will be listed here (let's say songs and artists)
$ ls
songs artists
$ cd songs
$ pwd
/mnt/solr6/data/songs
$ # in this core you will have these 2 folders primarily:
$ # conf/ data/
$ # conf/ has all your configurations
$ # such as "solrconfig.xml" and "schema.xml"
$ # data/ has your actual indexes
$ # we will be concerned with conf/ folder only, usually.

All configurations pertaining to solr core are defined in solrconfig.xml. It is fairly straightforward to create a Solr Master and Slave. All we need to do is the following in solrconfig.xml:

<requestHandler name="/replication" class="solr.ReplicationHandler" >
<!--
For enabling Master just do the following. This specifies the replication to slaves. The files in "conf/" folder that will be replicated are mentioned in confFiles. This is important. Let's say we change our schema on master, we do not want to change schema on each of the slaves. New schema can be simply replicated.
-->
<lst name="master">
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">schema.xml</str>
</lst>
<!--
For enabling Slave just uncomment the following and comment out the master block above. This slave syncs data from master every 1 hour.
-->
<!--
<lst name="slave">
<str name="masterUrl">http://solr_master:8080/solr/songs</str>
<str name="pollInterval">01:00:00</str>
</lst>
-->
</requestHandler>

For the purpose of further discussion on Solr setup and debugging queries, we would not require slave servers. Let’s have a look at Solr UI. Solr UI is a powerful feature with admin capabilities and analysis tools. To reload the new core, let’s look at the UI. Go to http://solr_master:8080/solr or http://localhost:8080/solr if you have this local setup. We will continue with Solr UI in next post.

To be continued…