Stop tracking total hits

Improving retrieval times for Elasticsearch results

--

Elasticsearch 7 promises a new “magic WAND” to dramatically improve retrieval times: It’s not every day that you get to make things several times faster.

We thought “Wow — why not?”

eBay Kleinanzeigen is a classifieds platform, and we use Elasticsearch.

— So we thought we’d test it.

What would happen if we stopped tracking the total number of hits?

Our cluster for serving public searches currently contains more than 34 million documents. We run it on 250 nodes in two datacenters. In peak times, it answers more than 15,000 search requests per second.

We want to try this feature and calculate the savings. If we disable the track_total_hits parameter, Elastic returns the exact number of hits only until the given maximum. The default is 10,000, which is good for our test.

Setup

For testing reasons, we have a search amplifier in our search service. This one was implemented to run tests against our cluster to estimate cluster sizes based on real queries and real search behavior. The results are not processed further, but we take all metrics from it.

search service setup with an amplifier for testing

We use our amplified search setup to set the track_total_hits parameter and generate a metric with and without the parameter setting.

Amplified search request with track_total_hits enabled

To have a better picture, we track the response times in two groups, depending on the size of the results:

  • Group A: 10,000 or more results
  • Group B: less than 10,000 results

The code is live and we amplify our search requests by an additional 10% for the test.

Results

The raw values for the response times already show a big improvement. Smaller requests (<10K) already benefit from the feature: 9 milliseconds mean response time instead of 10 ms. We see a bigger improvement on larger result sizes: the mean value of 30 ms was reduced by nearly 10 ms to 20 ms.

Raw results, comparing the response times

In absolute terms, we see a 20% improvement overall, with 10% for smaller results, and up to 30% for larger result sets).

Absolute results, mean

We double-checked the results for p98. Here we observed an average improvement of 40%.

Absolute results, p98

Conclusion — Goodbye total hits count!

In our case, “things are several times faster” holds true — the change saves us more than 20% of the previous response times.

The results are significant and we will use this feature for our public search.

--

--