Want to find what you’re looking for? Welcome to parser-dome, search warrior! For Solr to return the results you expect, you need your queries to have an understandable syntax. Depending on the parser you use to transform raw queries into Solr-compatible objects, those queries can be interpreted in different ways. Choosing the right parser is of the utmost importance. Parsers hold the power to determine whether your users will be able to find what they desire or whether they’ll be sent down a fury road of unexpected results.

So, how do you pick a parser? Survival in this world comes from an understanding of what’s going on under the hood when transforming a human query into something that’s able to be interpreted by Solr. For an example of how different parsers interpret queries differently, let’s take a look at some code. Here’s how two of the most popular parsers, Lucene and eDisMax, handle the same query:

Query:

Lucene query parsing:

eDisMax query parsing:

In this instance, the Lucene parser throws an error because the query doesn’t match the required syntax. By contrast, eDisMax doesn’t complain, despite the query not fulfilling syntax requirements and one term being ignored. Want to understand what’s going on? Solr’s debug tools can help. You can find a quick guide for using these tools here.

It’s clear that Lucene and eDisMax behave differently, but which is better? There’s only one way to decide… PARSER-DOME!

Lucene vs eDisMax: Two parsers enter, one parser leaves

Lucene

Lucene is not user-friendly. It doesn’t know how to forgive syntax mistakes and it’s unable to search across multiple fields or specify different weightings for fields by default. Lucene is, however, great for built-in systems. It provides an intuitive syntax that, if well-written, offers a robust tool for creating structured queries that will be managed internally.

Lucene is the Standard Query Parser, but Solr allows us to change this easily, using its ‘defType’ parameter. Don’t get stuck always using the same query parser just because you always have done. Choosing the right parser for the right job is essential. If you don’t, you could be missing out on a whole range of features that will better suit your system’s needs.

eDisMax

On the other hand, the original DisMax query parser was created with one goal; to reduce the number of errors in user-facing systems with limited management control that result from direct user queries. The problems that usually arose in these kind of systems were often down to the fact that they exposed technical inputs to non-technical people. Allowing a user to freely input keyword queries would eventually bring about syntax errors. By providing a solution to this, the appearance of DisMax changed the world of search.

Why was DisMax useful?

  • It passed user queries to Solr directly. If a query doesn’t follow the expected syntax, it would treat the invalid input as a string.
  • It enabled multiple field query search (‘qf’ parameter).
  • It offered an alternative relevancy calculation by only counting the relevancy of the top scoring field in which a term matches.

DisMax became obsolete when eDisMax arrived on the scene. eDisMax included all of DisMax’s features along with some new ones, hence its full name, the Extended Disjunction Maximum Query Parser.

Among eDisMax’s main features lies its ‘boost’ parameter. Boost raises the score of a query in similar fashion to DisMax’s ‘bf’ parameter, but it’s multiplicative not additive, (if you are interested in knowing more, you can find other features here).

This new parser may seem very attractive and useful, but not all that’s shiny and chrome is gold. The key to picking the right parser is in knowing its limitations.

What’s Mediocre about eDisMax?

Performance

eDisMax’s ability to search across multiple fields does not always bring advantages. This is especially true when it comes to performance. From a time efficiency point of view, it makes sense that searching query terms across every specified field in the ‘qf’ parameter will be slower than searching just one field. However, eDisMax does give the option of using Lucene syntax, which can help speed things up.

Relevancy

The name eDisMax relates to the parser’s approach to computing a matched documents’ score. When computing the relevancy of a term, it only takes into account the top score fields per document; among all distinctly scored alternatives, the ‘disjunction’, it chooses the ‘maximum’.

We can use eDisMax’s tie breaker parameter to control the percentile weighting of each field when calculating total score per product. Values can be set between ‘0’ and ‘1’, where ‘0’ means choosing the maximum and ‘1’ means doing a sum, (echoing Lucene behaviour).

This kind of relevancy approach only relates to the best scored fields in each document. Its final relevancy score won’t be fully representative of how well the complete document is scored. Lucene’s computation arguably making more accurate relevancy assessments by taking into account all scores.

Multiple Field Query Search

Searching over several fields with eDisMax can be approached using a Lucene parser. Instead of the ‘qf’ parameter, we can use boolean operators to specify each field we need to search. However, this approach can be tedious when needing to search across many fields.

For example, the query below uses eDisMax as the query parser:

Once the query has been transformed into Lucene objects, we can see that a boolean operation has been applied:

This query uses Standard Query Parser syntax:

In this case, transformation into Lucene objects is far simpler:

Search Valhalla Awaits

There is no one true superior parser. The best parser to pick depends on your system specifications and on your expected parsing scenarios.

Your choice isn’t just limited to Lucene and eDisMax. There are parsers to manage complex phrases containing wildcards and boolean operators. There are parsers to interpret functions (“{!func}”) inside queries. There are even parsers that transform graph-type queries.

One thing that’s important to be clear on, is that you should pick carefully. A change in parsing strategy down the line could cause queries to start returning zero results when searching. Parse eternal, search warrior.

empathy.co

Helping brands provide irresistible search. Pairing software with interfaces to combine function and beauty in one. From mere results to meaningful relations and engaging interactions.

Daniel González Gómez

Written by

Back-end engineer at EmpathyBroker, historical fencer and film lover

empathy.co

Helping brands provide irresistible search. Pairing software with interfaces to combine function and beauty in one. From mere results to meaningful relations and engaging interactions.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade