Debugging & Optimizing Elasticsearch Top Hits Aggregation

Praveen Kumar
Engineering @ Housing/Proptiger/Makaan
3 min readFeb 5, 2024

How we optimized legacy top hits aggregation query time from 500ms to ~50ms.

We are using an elastic search on the search listings page to filter listings based on user filters and some complex aggregation logic.

Problem Statement:

Our legacy aggregation query was taking approximately 500ms for city-level SRP ( Search Results Page) and approximately 200-300ms for city locality pages.

Original Query:

{
"aggregations": {
"certified_profile_aggs": {
"terms": {
"field": "poc.uuid",
"size": 50,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"certified_reverse_nested_aggs": {
"reverse_nested": {},
"aggregations": {
"certified_top_hits_score_aggs": {
"top_hits": {
"from": 0,
"size": 9,
"version": false,
"explain": false,
"_source": false,
"sort": [
{
"_script": {
"script": {
"source": "miscellaneous equation of relevance score",
"lang": "painless"
},
"type": "number",
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}

Problem Identification

Ist obvious issue was to remove script sorting.

Solution: Moving this score equation to a precomputed score at indexing time.

The impact of the above was very minor like the query improved from 300ms to 280ms.

2nd issue:- Upon checking next, we profiled the query and saw mainly top hits aggregation was taking time and was the main culprit around 70% of time consumed by top hits itself.

Why top hits take time:- because it keeps track of the most relevant matched document and loads all docs from parent bucket aggregation (so, depends on parent terms aggregation size as well).

Deeper Analysis …

Since we were only getting id from source in top hits aggregation, so we tried two solutions.

PS:- In the below solutions, we will need to send an additional query with only doc ids but that will take hardly ~10ms, so overall very beneficial.

Solution -1) Keep source disabled in top hits. This reduced query time from 500ms to 100ms. As we were still getting doc id from the _id meta field in response.

{
"aggregations": {
"certified_profile_aggs": {
"terms": {
"field": "poc.uuid",
"size": 50,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"certified_reverse_nested_aggs": {
"reverse_nested": {},
"aggregations": {
"certified_top_hits_score_aggs": {
"top_hits": {
"from": 0,
"size": 9,
"version": false,
"explain": false,
"_source": false,
"sort": [
{
"doc_relevance_score": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}

Innovative & Most Optimal Solution: The magical idea is to get rid of top hits aggregation and get top matching docs via nested + nested aggregation.

Solution — 2) Replace top hits with nested + nested aggregation. This reduced query time from 500ms to ~70ms.

{
"aggregations": {
"certified_profile_aggs": {
"terms": {
"field": "poc.uuid",
"size": 60,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"certified_reverse_nested_aggs": {
"reverse_nested": {},
"aggregations": {
"relevance_score_aggs": {
"terms": {
"field": "doc_relevance_score",
"size": 9,
"order": [
{
"_key": "desc"
}
]
},
"aggregations": {
"inside_ids_aggs": {
"terms": {
"field": "id",
"size": 9,
"order": [
{
"_key": "desc"
}
]
}
}
}
}
}
}
}
}
}
}

Summary Comparison for two query designs.

--

--