Debugging & Optimizing Elasticsearch Top Hits Aggregation

Published in

Engineering @ Housing/Proptiger/Makaan

3 min readFeb 5, 2024

How we optimized legacy top hits aggregation query time from 500ms to ~50ms.

We are using an elastic search on the search listings page to filter listings based on user filters and some complex aggregation logic.

Problem Statement:

Our legacy aggregation query was taking approximately 500ms for city-level SRP ( Search Results Page) and approximately 200-300ms for city locality pages.

Original Query:

{
  "aggregations": {
    "certified_profile_aggs": {
      "terms": {
        "field": "poc.uuid",
        "size": 50,
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "order": [
          {
            "_count": "desc"
          },
          {
            "_key": "asc"
          }
        ]
      },
      "aggregations": {
        "certified_reverse_nested_aggs": {
          "reverse_nested": {},
          "aggregations": {
            "certified_top_hits_score_aggs": {
              "top_hits": {
                "from": 0,
                "size": 9,
                "version": false,
                "explain": false,
                "_source": false,
                "sort": [
                  {
                    "_script": {
                      "script": {
                        "source": "miscellaneous equation of relevance score",
                        "lang": "painless"
                      },
                      "type": "number",
                      "order": "desc"
                    }
                  }
                ]
              }
            }
          }
        }
      }
    }
  }
}

Problem Identification

Ist obvious issue was to remove script sorting.

Solution: Moving this score equation to a precomputed score at indexing time.

The impact of the above was very minor like the query improved from 300ms to 280ms.

2nd issue:- Upon checking next, we profiled the query and saw mainly top hits aggregation was taking time and was the main culprit around 70% of time consumed by top hits itself.

Why top hits take time:- because it keeps track of the most relevant matched document and loads all docs from parent bucket aggregation (so, depends on parent terms aggregation size as well).

Deeper Analysis …

Since we were only getting id from source in top hits aggregation, so we tried two solutions.

PS:- In the below solutions, we will need to send an additional query with only doc ids but that will take hardly ~10ms, so overall very beneficial.

Solution -1) Keep source disabled in top hits. This reduced query time from 500ms to 100ms. As we were still getting doc id from the _id meta field in response.

{
  "aggregations": {
    "certified_profile_aggs": {
      "terms": {
        "field": "poc.uuid",
        "size": 50,
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "order": [
          {
            "_count": "desc"
          },
          {
            "_key": "asc"
          }
        ]
      },
      "aggregations": {
        "certified_reverse_nested_aggs": {
          "reverse_nested": {},
          "aggregations": {
            "certified_top_hits_score_aggs": {
              "top_hits": {
                "from": 0,
                "size": 9,
                "version": false,
                "explain": false,
                "_source": false,
                "sort": [
                  {
                    "doc_relevance_score": {
                      "order": "desc"
                    }
                  }
                ]
              }
            }
          }
        }
      }
    }
  }
}

Innovative & Most Optimal Solution: The magical idea is to get rid of top hits aggregation and get top matching docs via nested + nested aggregation.

Solution — 2) Replace top hits with nested + nested aggregation. This reduced query time from 500ms to ~70ms.

{
  "aggregations": {
    "certified_profile_aggs": {
      "terms": {
        "field": "poc.uuid",
        "size": 60,
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "order": [
          {
            "_count": "desc"
          },
          {
            "_key": "asc"
          }
        ]
      },
      "aggregations": {
        "certified_reverse_nested_aggs": {
          "reverse_nested": {},
          "aggregations": {
            "relevance_score_aggs": {
              "terms": {
                "field": "doc_relevance_score",
                "size": 9,
                "order": [
                  {
                    "_key": "desc"
                  }
                ]
              },
              "aggregations": {
                "inside_ids_aggs": {
                  "terms": {
                    "field": "id",
                    "size": 9,
                    "order": [
                      {
                        "_key": "desc"
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Summary Comparison for two query designs.

Debugging & Optimizing Elasticsearch Top Hits Aggregation

Problem Identification

Summary Comparison for two query designs.

Written by Praveen Kumar