If they match, I want them to be always first: Boosting documents in Apache Solr with the Boost Query parameter
A few days ago, a client asked me to give the highest possible priority to the subset of documents that match with the restrictions of the context from where they were queried for.
The process of giving higher relevance to a set of documents over others is called boosting, Solr support at least four ways of changing the boost factors of the documents:
- By boosting terms
q=black^2.0
. - By boosting the fields where Solr search for the terms
defType=edismax&q=black&qf=text name^2.0
. - By using the boost query parameter
defType=edismax&q=black&bq=cat:electronics^2.0
. - By using query functions with the boost functions parameter
bf
which is essentially a shorthand forbq={!func}
.
The last three options are not supported by the Standard Query Parser but by the DisMax and the eDisMax Query Parser.
Hands-on experience
Let’s say that in our system we can create a folder, categorize it and add to it documents from the Techproducts collection. The search engine behind our system should put first the documents with the same categories of the containing folder, followed by the other documents matching our query.
I’d assume you’ve downloaded the Solr binary, started Solr in SolrCloud mode and indexed the Techproducts example data successfully.
Let’s see what we get back from Solr when querying for documents with the word black.
We see four documents, two of them are of the book category, and two are of the electronics category.
Pay special attention to the score Lucene gave to each document, they are the heart of Lucene work and they will change when we use the boosting feature.
How should we do when searching for documents to be added to a folder with the electronics category given that they should appear first? We can use the boost query bq
parameter to tell Solr to prioritize documents that match with our context.
To use the bq
parameter, we have to use either the DisMax or the eDisMax Query Parser, we’ll use the second one which gives us more flexibility.
DisMax is designed to throw as few errors as possible and eDisMax extends this design to support with the full Solr’s Standard Query Parser syntax and a lot of extra parameters.
In addition to supporting all the DisMax query parser parameters, Extended Dismax supports the full Lucene query parser syntax with the same enhancements as Solr’s standard query parser.
By adding two parameters to the URL we just used to query documents before, we instruct Solr to use eDisMax defType=edismax
instead of the Standard Query Parser and also we boost documents of the electronics category bq=cat:electronics^2.0
.
Notice how big are the score fields now for the two documents of the electronics category. They of course now appear first on the list just the way we want.
TL;DR
The DisMax and eDisMax Query Parser provide us a nice way to give to documents that match with our requirements higher relevance. The Boost Query parameter bq
allows us to boost documents with a query that could differ from the user’s main query q
or q.alt
.