If they match, I want them to be always first: Boosting documents in Apache Solr with the Boost Query parameter

3 min readFeb 21, 2018

A few days ago, a client asked me to give the highest possible priority to the subset of documents that match with the restrictions of the context from where they were queried for.

The process of giving higher relevance to a set of documents over others is called boosting, Solr support at least four ways of changing the boost factors of the documents:

By boosting terms q=black^2.0.
By boosting the fields where Solr search for the terms defType=edismax&q=black&qf=text name^2.0.
By using the boost query parameter defType=edismax&q=black&bq=cat:electronics^2.0.
By using query functions with the boost functions parameter bf which is essentially a shorthand for bq={!func}.

The last three options are not supported by the Standard Query Parser but by the DisMax and the eDisMax Query Parser.

Hands-on experience

Let’s say that in our system we can create a folder, categorize it and add to it documents from the Techproducts collection. The search engine behind our system should put first the documents with the same categories of the containing folder, followed by the other documents matching our query.

I’d assume you’ve downloaded the Solr binary, started Solr in SolrCloud mode and indexed the Techproducts example data successfully.

Let’s see what we get back from Solr when querying for documents with the word black.

Querying Techproducts documents with the term “black”

We see four documents, two of them are of the book category, and two are of the electronics category.

Pay special attention to the score Lucene gave to each document, they are the heart of Lucene work and they will change when we use the boosting feature.

How should we do when searching for documents to be added to a folder with the electronics category given that they should appear first? We can use the boost query bq parameter to tell Solr to prioritize documents that match with our context.

To use the bq parameter, we have to use either the DisMax or the eDisMax Query Parser, we’ll use the second one which gives us more flexibility.

DisMax is designed to throw as few errors as possible and eDisMax extends this design to support with the full Solr’s Standard Query Parser syntax and a lot of extra parameters.

In addition to supporting all the DisMax query parser parameters, Extended Dismax supports the full Lucene query parser syntax with the same enhancements as Solr’s standard query parser.
— The Extended DisMax (eDisMax) Query Parser

By adding two parameters to the URL we just used to query documents before, we instruct Solr to use eDisMax defType=edismax instead of the Standard Query Parser and also we boost documents of the electronics category bq=cat:electronics^2.0.

Techproducts matching the term “black” using eDisMax and boosting them with the bq parameter

Notice how big are the score fields now for the two documents of the electronics category. They of course now appear first on the list just the way we want.

TL;DR

The DisMax and eDisMax Query Parser provide us a nice way to give to documents that match with our requirements higher relevance. The Boost Query parameter bq allows us to boost documents with a query that could differ from the user’s main query q or q.alt.

If they match, I want them to be always first: Boosting documents in Apache Solr with the Boost Query parameter

Hands-on experience

TL;DR

Written by Pablo Castelnovo