Apache Solr Fundamentals

Manish Sharma
5 min readAug 13, 2023

--

Apache Solr Fundamentals

Apache Solr is an information retrieval library or search platform capable of handling unstructured and semi-structured data.

It is used to search and analyze huge amounts of data in real time.

Apart from full-text search, Solr also supports text analysis.

Solr is distributed in Nature thus making it easier to scale.

Apache Solr is truly open source. This is in contrast to Elastic, which is managed by a community.

Solr supports JSON ,XML, CSV and optimized binary response formats.

Solr Simplified Architecture

Apache Solr Simplified Architecture
Solr Simplified Architecture

As shown in diagram:

Index

Apache Solr Index is the “database” for managing structured/unstructured data. It stores data in a way that makes analysis and full text search easier.

Query Parser

All queries submitted by client is handled by Query parser.

Response Handler

Response handler is responsible for generating response is appropriate format (json/xml/csv) for client.

Update Handler

It is used for indexing; i.e insertion, updation, Deletion of Data in Index. For example if we want our MySQL data to be in sync with Apache Solr, we have to create an Update Handler responsible fo sync.

How Solr stores Data ?

Apache Solr stores data in index in an organized manner. Internally data is organized as documents, where each document is a collection fo fields. Corresponding to each document type is a Schema , that stores details about the field types and fields

Ingestion Approaches

Ingestion is the process of importing external data into Solr index. There are atleast 3 ways fo doing this:

For binary files(pdf/doc etc), we can use tools like Solr Cell

For XML Files we can send HTTP requests

For custom ingestion we may write our own program using Solr Client API.

Facets and Constraints

Solr supports faceting. Faceting means arrangement of search results into categories. A Facet represents search result category. Constraints are the facet values.

Facets and Constraints in Apache Solr
Facets and Constraints

As shown in digram , Shirts and Trousers are facets, while Full Sleeve, Half Sleeve etc are constraints, each associated with a count.

Let’s see Solr In action.

Download a binary release from https://solr.apache.org/downloads.html

Extract

tar -xvf solr-9.3.0.tgz

Start a node in cloud mode (-c)

cd solr-9.3.0.tgz
bin/solr start -c

Optionally you may access Solr Admin UI by navigating to http://localhost:8983/solr/#/

Step by Step

  1. Create a Collection
  2. Define Schema for Collection
  3. Populate Collection (Index some documents)
  4. Commit ( Make Changes permanent)
  5. Execute Queries

In real world we will create a Django App or Laravel App to perform operations mentioned in the steps above. For now we will be using postman for the purpose. You may download postman collection here .

Creating Collection

Make sure Apache Solr is up and running.

End Point: http://localhost:8983/api/collections

HTTP Verb: POST

Headers: Content-Type: application/json

Data:

{
"name": "employee",
"numShards": 1,
"replicationFactor": 1
}
Apache Solr: Creating Colection using Postman
Apache Solr: Creating Colection using Postman

Defining Schema for Collection

End Point: http://localhost:8983/api/collections/employee/schema

HTTP Verb: POST

Headers: Content-Type: application/json

Data:

{
"add-field": [
{
"name": "name",
"type": "text_general",
"multiValued": false
},
{
"name": "department",
"type": "string",
"multiValued": false
},
{
"name": "designation",
"type": "string",
"multiValued": false
},
{
"name": "experience",
"type": "pint"
}
]
}

Note thatc String type does not perform tokenization etc, and is used for fields to be used for for facetting. Text performs tokenization etc and thus provides powerful partial matching. Visit this link to know more Solr Field Types.

Populating Collection

End Point: http://localhost:8983/api/collections/employee/update

HTTP Verb: POST

Headers: Content-Type: application/json

Data:

[
{
"id": "emp-001",
"name": "Chris Nathan",
"department": "Dev",
"designation": "Analyst",
"experience": 7
},
{
"id": "emp-002",
"name": "Christina ",
"department": "Dev",
"designation": "Programmer",
"experience": 3
},
{
"id": "emp-003",
"name": "Naresh",
"department": "Marketing",
"designation": "Executive",
"experience": 2
}
]

Commiting the Changes

End Point: http://localhost:8983/api/collections/employee/config

HTTP Verb: POST

Headers: Content-Type: application/json

Data:

{
"set-property": {
"updateHandler.autoCommit.maxTime": 15000
}
}

Executing Queries

List all employees

http://localhost:8983/solr/employee/select?q=*

List all employees where department is ‘Dev’ (q=department:Dev)

http://localhost:8983/solr/employee/select?q=department:Dev

List name field (fl) only employees where department is Dev

http://localhost:8983/solr/employee/select?q=department:Dev&fl=name

Output (Other field omitted for brevity):

{
"numFound": 2,
"start": 0,
"numFoundExact": true,
"docs": [
{
"name": "Chris Nathan"
},
{
"name": "Christina "
}
]
}

List name and id fields (fl) only employees where department is Dev

http://localhost:8983/solr/employee/select?q=department:Dev&fl=name&fl=id

List name and id fields (fl) only employees where department is Dev and experience is in range 2 to 4 (experience:[2 TO 4]). Remember Solr Queries are case sensitive.

http://localhost:8983/solr/employee/select?q=department:Dev&fl=name&fl=id&fl=experience&fq=experience:[2 TO 4]

Worth considering:

Read this about How Proprietary Software Takes Away Your Freedom.

Excerpt from the article :

“On the Internet, proprietary software isn’t the only way to lose your computing freedom. Service as a Software Substitute, or SaaSS, is another way to give someone else power over your computing.

The basic point is, you can have control over a program someone else wrote (if it’s free), but you can never have control over a service someone else runs, so never use a service where in principle running a program would do.

SaaSS means using a service implemented by someone else as a substitute for running your copy of a program. The term is ours; articles and ads won’t use it, and they won’t tell you whether a service is SaaSS. Instead they will probably use the vague and distracting term “cloud,” which lumps SaaSS together with various other practices, some abusive and some ok. With the explanation and examples in this page, you can tell whether a service is SaaSS.”

Feel free to add a comment if you have any doubt, query or question. Any such discussion will help us grow together.

Happy Coding.

--

--