Stop tracking total hits
Improving retrieval times for Elasticsearch results
Elasticsearch 7 promises a new “magic WAND” to dramatically improve retrieval times: “It’s not every day that you get to make things several times faster.”
We thought “Wow — why not?”
eBay Kleinanzeigen is a classifieds platform, and we use Elasticsearch.
— So we thought we’d test it.
What would happen if we stopped tracking the total number of hits?
Our cluster for serving public searches currently contains more than 34 million documents. We run it on 250 nodes in two datacenters. In peak times, it answers more than 15,000 search requests per second.
We want to try this feature and calculate the savings. If we disable the track_total_hits
parameter, Elastic returns the exact number of hits only until the given maximum. The default is 10,000, which is good for our test.
Setup
For testing reasons, we have a search amplifier in our search service. This one was implemented to run tests against our cluster to estimate cluster sizes based on real queries and real search behavior. The results are not processed further, but we take all metrics from it.
We use our amplified search setup to set the track_total_hits
parameter and generate a metric with and without the parameter setting.
To have a better picture, we track the response times in two groups, depending on the size of the results:
- Group A: 10,000 or more results
- Group B: less than 10,000 results
The code is live and we amplify our search requests by an additional 10% for the test.
Results
The raw values for the response times already show a big improvement. Smaller requests (<10K) already benefit from the feature: 9 milliseconds mean response time instead of 10 ms. We see a bigger improvement on larger result sizes: the mean value of 30 ms was reduced by nearly 10 ms to 20 ms.
In absolute terms, we see a 20% improvement overall, with 10% for smaller results, and up to 30% for larger result sets).
We double-checked the results for p98. Here we observed an average improvement of 40%.
Conclusion — Goodbye total hits count!
In our case, “things are several times faster” holds true — the change saves us more than 20% of the previous response times.
The results are significant and we will use this feature for our public search.