How to implement faceted search the right way

Alejandro Pérez López
Empathy.co
Published in
5 min readNov 30, 2018
Photo by Anthony Martino on Unsplash

Most products have multiple attributes that they could be filtered by during a search experience. Attributes like colour, brand, product type, and size can help refine a query. With this in mind, we can split queries into two main groups; specific or generic. Specific queries, like “red bikini” or “green jacket for men”, include terms related to these attributes and therefore reduce the number of results that are returned. They are more likely to show the user the products they expect to see. On the other hand, generic queries, such as “jeans”, “jackets”, and “men”, only relate to a type of product or category. They usually return a large number of results that will then need refining.

The kind of queries performed by an user will depend on many factors. For example, queries submitted from a mobile device are primarily generic ones, meanwhile those coming from a computer are often specific. However, most search sessions will start out with a generic query and gradually become more specific.

Let’s suppose that a user types “jeans”. It’s likely that this will return a mixture of results for both men’s and women’s jeans in a variety of styles. Imagine that this user is a woman who loves skinny jeans but doesn’t like to spend much on clothes. She’s going to want to filter those results. You could leave her to enter a specific query or you could provide filtering options via faceting. Here’s my advice for anyone adding facets to their search experience.

Generic query “Jeans”
Filtering generic query results with a Category to retrieve specific results.

Select candidate fields to be configured as facets

At this point, you might think that it’s a good idea to have as many fields configured as facets as possible, but this isn’t true. A very important pair of index configuration tasks is to select which fields are to be facets and what type of facet will fit each one.

It’s a good idea to use facets for fields that will help users to distinguish a subgroup of documents from a huge and varied number. Fields like section, product type, brand, price, etc. are always a good to consider as facet candidates.

Avoid performance issues

Another thing to take into consideration is how facets will impact performance. Adding facets to a request will increase the amount of time it takes for a request to be completed. If your index contains a large number of documents and you’re faceting a field with a lot of different values, such as price, it’s essential that you think carefully. Generic queries in particular could take a lot more time than expected. The quality of your data doesn’t matter if the request took too long return the results.

Use the right facets for the right types of data

Different data requires different facets. Solr supports multiple types. According its documentation, there are two main types of facet:

Facet queries:
These allow you to generate a facet based on a certain value using a lucene syntax query. It’s usually used for numeric ranges or for faceting based on a certain value:

facet.query=price:[1 TO 100]
facet.query=status:in_stock

This would return the number of docs that have a price between 1 and 100 and the products that have the flag “in_stock” inside the status field.

Field based facets:

These return the available values inside a field.

facet.field=product_type

This would return all the available values for the product_type field from all the docs contained in the index.

Based on this, we would use field based facets for attributes that need to show the available values count and facet queries for attributes that are either numeric or only require a certain value to be counted. For example, it would be a mistake to configure a price field as a field-based facet as it would create an entry for every different price value. Configuring price ranges using facet queries is a better idea.

Use the right field type

Solr supports a lot of field types. Typically, you’ll want to use solr.TextField based fields for searching and solr.StrField based fields for data that should be returned as is.

For example, let’s suppose that you launch a facet operation over a field that’s marked as text with the following configuration:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
</analyzer>
</fieldType>

And suppose that you’re using a field called product_type:

<field name="product_type" type="text" stored="true" indexed="true" multiValued="true" />

If the product_type field contains values like “Summer Jeans” or “Winter Jackets” and we apply a facet operation over this field by adding facet=true and facet.field=product_type parameters to the Solr query, you would get:

“sumer” -> with count 1
“jeans” -> with count 1
“winter” -> with count 1
“jackets” -> with count 1

As you probably noted in this example, the values were transformed into lowercase and were split by the whitespace character. This is because of how the field was configured and the fact that facets use the indexed value as output. The index part uses the StandardTokenizerFactory as well as the LowerCaseFilterFactory and therefore transforms the indexed terms according to these.

On the other hand, if we configure it as String by using the solr.StrField field type, we would get the following results:

<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms=true”/>
<field name="product_type" type="string" stored="true" indexed="true" multiValued="true" />

“Summer jeans” -> with count 1
“Winter jackets” -> with count 1

The String field type is aimed to keep the values as they are, without any modification, (e.g. no splitting or text transformations).

Summary

In conclusion, I would offer three pieces of advice for anyone implementing facets to help filter and refine their site’s search experience:

  • When used properly, facets are very extremely good at helping users to easily refine their search results. It’s crucial that you spend some time analysing your existing data to create a good facet configuration. Studying your users’ queries and identifying areas that create friction during their search and discovery journey is a great place to start.
  • Remember that different types of data require different facet configurations, so plan carefully. A good facet strategy will improve your users’ search experience, by increasing findability. However, using facets incorrectly could just as easily damage the user experience.
  • Finally, be sure to keep an eye on query performance. It is one of the most important things to look out for in a search engine. You need to reach a balance between offering better findability through the use of facets and a smoother experience thanks to speed query performance.

--

--