To Stem or not to Stem, that is the question!

Stemming is a method often used in search engine optimisation where words are trimmed and then collated together under the umbrella of a common root. For example: the words gold, golden and goldeny would belong to the group gold.

The advantage of this technique is being able to collect a wider range of queries under a single label. The disadvantage, however, is it can match two labels which are distinctly different: red dress and red dresser are two dissimilar items that we probably don’t want appearing in the same search results.

In this article, I’m going to investigate stemming in more depth and look at whether it makes a meaningful difference, optimises our search engine and improves our clients’ site search.

To measure success, I’m considering the following values:

  • Findability: A formula developed by our team that gives an idea of how effectively customers are able to find what they’re looking for.
  • No Results: Queries which don’t find an exact match in our database.
  • Add2Cart: Items added to cart for purchase.

My investigation will include:

  • A Long-term study; 6 months — 3 months prior and 3 months after stemming.
  • A Short-term study; 2 weeks — 1 week before and 1 week after stemming.
  • A comparison of the queries changed by stemming with those queries not changed.

It’s important to look at the data both in the long-term and short-term as long-term performances make more sense statistically as there is more data available and it’s often, therefore, less volatile, however a greater time range is also more likely to incorporate specific events or activities that may drastically affect search behaviour, for example Christmas. This is why comparing both long-term and short-term is the best tactic in order to see a consistent set of behaviours.

Simple observations

The preliminary analysis revealed interesting, and perhaps surprising, results. Where there was a variation once stemming had been applied, analysis concluded that the change could not be put down to stemming alone.

An example below shows two clients that appeared to have a change in behaviour after stemming had been implemented.

Queries dip after stemming (green box), No Results seems to increase and findability seems to decrease

However, we can’t put this change done to stemming alone because:

  1. The changes occurred 10 days after stemming was implemented.
  2. Queries made by users, the purple line, also dropped during the period, and there is no reason why applying stemming would affect the number of users searching per day; this suggests the variation is influenced by an external factor such as the end of a sales period. After research, I found this to be the case as it was in fact due to an event unrelated to stemming.
  3. Findability and No Results seem linked to the change in query volume and after a short period they recover alongside the number of queries, even when stemming is still active.

Therefore, overall, there is a slim chance that stemming caused the change in performance.

All other clients displayed regular behaviour throughout the period, as per the following example which is a sample of what most client’s profiles looked like when graphed.

In most cases there is no observable evidence of a behaviour change after stemming has been applied.

One client even displayed a fall in performance after implementing stemming, as we can see below:

A decrease in performance happened after stemming although not immediately after. Stemming was however withdrawn at a later date and both queries and findability went back to normal instantly.

I’ll go on to look at this particular occurrence in more detail later. I will refer to this case as case 1.

Numerical analysis:

When looking at the numerical statistics over these periods we can also see very minimal movements and changes in behaviour.

Variations over 6 months (3 months before and 3 months after stemming is active)

  • Findability: +0.0167846% per day
  • No Results: -0.00866285% per day
  • Add2Cart: -0.0102864% per day

This shows us that there is no great significance and this variation falls within normality. This is also the case for the shorter period.

Variations over 2 weeks (1 week before and 1 week after stemming)

  • Findability: -0.02308963158% per day
  • No Results: +0.06822526316% per day
  • Add2Cart: +0.007957894737% per day

Again, these results fall within normality and are close enough to 0 to be able to claim that overall there was no significant change in the success measures.

Query by query analysis in case 1

I wanted to undertake a further study to look at the reported impact in search behaviour for this particular client. For this, I classified queries into two groups: those that were changed by stemming and those that weren’t. For example:

query 1 on the left was changed whereas query 2 on the right was not altered by stemming. Both dropped during the period when stemming was active (queries are not mentioned to protect our client’s data)

This was carried out with 100 queries, 50 in each group.

What we saw was that all queries had dropped generally. If stemming had caused a drop in findability it would have affected the words changed by stemming but not the other queries. In case 1 we had an overall drop, with the following results:

  • Overall findability fell by 3.6354%
  • Findability for queries changed by stemming fell by 3.5337%
  • Findability for queries not changed by stemming fell by 3.7610%

I would expect a fall in performance from the queries changed by stemming and no change in performance from those not affected by stemming.

The queries not affected by stemming should have remained at the same level of performance but instead they fell by the same amount as those changed by stemming.

Whatever was the cause of this drop in performance it was affecting all queries equally and therefore, we can safely conclude that whatever caused the drop in performance it was unlikely to be caused by stemming.

In conclusion, through studying clients that reported changes after stemming was implemented and, in particular, also specific words that changed with stemming and then comparing those to words that didn’t change, we’ve been able to get a good understanding of the effects and impact of employing a stemming method.

We also studied all clients statistically both during long periods (6 months) and short periods (2 weeks) to ensure that we took into account any possible external factors that could have impacted a behaviour change during that period.

As we’ve seen, in all these scenarios, we can conclude that there is no evidence, when looking at our search engine and our clients, that stemming has any noticeable impact or makes a difference in search performance.