Tuning Search Relevance with Elasticsearch Custom Scoring Techniques

Published in

Fission Labs

12 min read7 hours ago

Imagine browsing Amazon for a product or LinkedIn for a job. You type a keyword and are instantly presented with highly relevant results tailored to your preferences and context.

Have you ever wondered how these results are ranked and displayed?

The search engine must understand the context of your search, prioritize certain factors over others, and efficiently handle large volumes of data to deliver the best possible results.

At Fission Labs, we partnered with a client who was struggling to enhance their job search efficiency and relevance. By leveraging Elasticsearch, we developed a custom matching engine that now drives one of the largest labour market analytics platforms, delivering highly accurate and relevant search results.

The Challenge with Traditional Databases
Using a traditional database for searching presents several issues:

Exact Matches and Inefficient Keyword Search: Traditional databases return exact matches for keywords, missing out on relevant results with slight variations in wording. For example, a search for “software developer” might not return results for “software engineer” or “developer”.
Lack of Context Consideration: The relevance of results is often not considered. A job posting containing the keyword “developer” multiple times is treated the same as one that mentioned it only once.

Their indexes are optimized for range queries and exact matches but are inefficient for full-text search, making it difficult to quickly and accurately retrieve documents based on keyword searches.

These challenges highlight the limitations of traditional databases and the need for more advanced search capabilities to improve relevance and context in search functionalities.

Client Challenges

The client faced challenges in performing searches on millions of records within acceptable API timeouts while maintaining relevance, scalability, and supporting pagination.
They also struggled with integrating data from multiple database instances and external sources effectively.

Client Initial Approach
Traditional databases have some limitations and challenges, to address these, the client implemented a custom search job mechanism that loaded data into memory and performed searches directly in memory. Additionally, they used Redis to cache user-level searches, improving the speed of repeated queries. Initially, this solution seemed like it could scale by adding more instances to handle the load. However, it quickly ran into problems.

Performance dropped when multiple users accessed the system, causing slow response times. Horizontal scaling proved ineffective as the data volume increased.
The search relevance was limited because it relied on direct filters in the transactional database to limit the data loaded into memory.
Despite efforts to shard and scale the database, it still struggled with slow queries, putting extra pressure on the system.

Transition to Elasticsearch

After conducting a detailed Proof of Concept (PoC) and analysis, we decided to transition to Elasticsearch, a distributed search engine built on top of Apache Lucene. Elasticsearch offers several advantages over traditional RDBMS

Full-Text Search Capabilities: Unlike RDBMS, Elasticsearch is designed for full-text search, enabling more flexible and powerful query capabilities.
Relevance Scoring: Elasticsearch ranks results based on relevance, not just keyword matches.
Scalability: Elasticsearch can handle vast amounts of data and still return results quickly.

To understand more about how scoring works in Elasticsearch, we need to take a brief detour into the concepts of inverted indexing, the core calculations of the BM25 algorithm, field-level scoring, and boosting. If you already know it then skip to the Function score section.

Inverted Index
An inverted index is a core data structure in Elasticsearch that enables rapid full-text searches. It maps terms to the documents containing them, much like an index at the end of a book.

Let’s take an example to understand the working of inverted indexes. Consider you have a title field in each document.

Inverted indexes are much better suited for full-text search compared to traditional B-tree indexes. For example, if you search for “powerful search engine,” this query contains three words, it only handles exact matches or simple patterns. If a document contains “search engine” but not “powerful search engine”, it will not show up in the search results.

On the other hand, an inverted index stores individual words and maps them to the documents they appear in. For example, the words “powerful,” “search,” and “engine” would each be mapped to the documents containing them. When searching for “powerful search engine,” the inverted index retrieves all documents containing any combination of these words, ensuring that relevant documents are included in the search results, even if they are not an exact match.

Elasticsearch uses many additional techniques to optimize this index to support different use cases, including phrase queries (where the order of words also matters), but this high-level example illustrates the basic concept.

Field-Level Scoring
Elasticsearch allows scoring to be calculated at the field level. This means that different fields in a document can contribute differently to the overall relevance score.

Example: Suppose we want to search for the term “powerful” and want matches in the “title” field to be weighted more heavily than matches in the “category” field.

{
  "query": {
    "multi_match": {
      "query": "powerful",
      "fields": [
        "title^3",
        "category"
      ]
    }
  }
}

The ^3 indicates that matches in the “title” field are weighted three times more heavily than matches in the “category” field. This means that if the term “powerful” is found in the title, it will significantly boost the document’s relevance score compared to finding it in the category.

Boosting
Boosting is a technique used to increase the relevance score of certain documents or fields based on predefined criteria.

Example: Suppose we want to boost the relevance score of documents in the “search engine” category.

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "powerful"
          }
        },
        {
          "match": {
            "category": {
              "query": "search engine",
              "boost": 2
            }
          }
        }
      ]
    }
  }
}

In this query, we have two match clauses. We want to give more weight to the second clause, which searches for documents where the category is “search engine” and applies a boost of 2 to increase their relevance score.

An inverted index helps retrieve all relevant documents quickly by mapping terms to the documents that contain them. Boosting and field-level scoring techniques allow us to increase the importance of particular fields where more emphasis is needed. Finally, the BM25 algorithm, an advanced version of TF-IDF, determines how relevant the search terms are to individual documents and the entire dataset, ensuring precise and meaningful search results. For the purpose of understanding, let’s discuss TF-IDF.

TF-IDF
TF-IDF stands for Term Frequency-Inverse Document Frequency. It is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus).

Term Frequency (TF) : Term frequency measures how often a term appears in a document. The more frequently a term appears, the higher its importance.

Inverse Document Frequency (IDF) : Inverse Document Frequency (IDF) measures the uniqueness or rarity of a term across the entire document corpus. Rare terms receive higher IDF scores. The count of documents containing the term (t) and the total number of documents are derived from the field’s inverted index.

TF-IDF Calculation : TF-IDF is the product of term frequency and inverse document frequency.

Example: Let’s say we search for the term “powerful engine” in the title field of our example documents, Elasticsearch will split the query into its two terms, calculate the TF-IDF score for each term separately, and then combine the scores.

Calculating TF-ID for “powerful”
Term Frequency (TF)
Document 1: The term “powerful” appears once. If the title has 6 words, TF is 1/6
Document 2: The term “powerful” appears once. If the title has 5 words, TF is 1/5

Inverse Document Frequency (IDF)
“powerful” appears in 2 out of 2 documents, so IDF is log⁡(2/2)=log⁡(1)=0. Since the term appears in all documents, it gets a zero as the score.

TF-IDF Calculation
Document 1: TF-IDF = TF * IDF = (⅙)×0=0
Document 2: TF-IDF = TF * IDF = (⅕)×0=0

TF-IDF calculation for the term “engine”
Term Frequency (TF)
Document 1: The term “engine” appears once in a title with 6 words.
TF = ⅙ = 0.1667
Document 2: The term “engine” does not appear. TF = 0

Inverse Document Frequency (IDF)
The term “engine” appears in 1 out of 2 documents. IDF(engine)=log⁡(2/1)= 0.301

TF-IDF Calculation
Document 1: TF(Document 1)×IDF(engine)=0.1667×0.301≈0.050
Document 2: TF(Document 2)×IDF(engine)=0×0.301≈ 0

Combined TF-IDF scores for each document
Document 1: Combined TF-IDF = 0+0.050 = 0.050
Document 2: Combined TF-IDF = 0+0 = 0

Document 1 has a higher relevance score for the search query “powerful engine” due to the presence of both terms in the title, while Document 2 has a relevance score of 0 because it does not contain “engine”.

TF-IDF provides the base relevance score, then we can apply additional relevance adjustments using other techniques, such as skill boosts, location-based decay, or custom script scoring.

While TF-IDF is useful, the BM25 algorithm improves upon it by addressing term frequency saturation and incorporating field length normalization, making it more effective for ranking search results in practical applications like Elasticsearch. Here is the link for more information. https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables

Customizing Job Search Relevance with Elasticsearch Function Score

The basic scoring in Elasticsearch offers flexibility and relevance for many use cases. However, our specific needs required more custom scoring logic. Incidentally, Elasticsearch provides the function_score query, allowing us to customize the relevance of job search results based on multiple factors, such as skill match and skill level. This approach combines exact skill ID matching with skill vector similarity to provide highly relevant search results.

While I can’t share the entire query and business logic, I can give you a glimpse of our customizations to help you understand the function_score query in general. Here is the sample data structure what we have in Elasticsearch

{
  "mappings": {
    "properties": {
      "job_title": {
        "type": "text"
      },
      "job_location": {
        "type": "geo_point"
      },
      "skills": {
        "type": "nested",
        "properties": {
          "skill_id": {
            "type": "keyword"
          },
          "skill_level": {
            "type": "integer"
          },
          "skill_vector": {
            "type": "dense_vector",
            "dims": 3
          }
        }
      }
    }
  }
}

Core Components of Our Custom Function Score
We utilized various scoring techniques, including location-based decay and script score, to enhance the relevance of job search results.

Location-Based Decay
We apply a decay function to prioritize jobs closer to the user’s location. This is particularly useful when users prefer jobs within a certain geographical radius.

Decay Functions Types: gauss, exp, linear

These functions decrease the score of a document based on its distance from a given point. We have used the Exponential (exp) decay function to smoothly reduce the score as the distance increases.

{
  "exp": {
    "job_location": {
      "origin": "37.7749,-122.4194",
      "scale": "50km",
      "offset": "10km",
      "decay": 0.5
    }
  }
}

This configuration means that jobs within 10 KM of the origin are unaffected by decay, while beyond 10 KM, the score decreases. The score is halved for every 50 KM beyond the offset.

Field Value Factor
Adjusts the score based on the skill level, ensuring that jobs requiring higher skill levels are prioritized. if required we can use this at each skill level.

{
  "field_value_factor": {
    "field": "skills.skill_level",
    "factor": 1.2,
    "modifier": "sqrt",
    "missing": 1
  }
}

Skill Vector Cosine Similarity Using Script Score
The script_score function enables you to write custom scripts to calculate the score of each document. The script can use document fields, parameters passed in the query, and even mathematical operations to generate a score. The script calculates the cosine similarity between the given skill vector and the job’s skill vector. The score is further adjusted by the skill level, ensuring higher skill levels contribute more to the relevance score.

"nested": {
  "path": "skills",
  "query": {
    "script_score": {
      "script": {
        "source": """
          double[] jobVector = doc['skills.skill_vector'].value;
          double[] givenVector = params.skill_vector;
          double dotProduct = 0.0;
          double normJob = 0.0;
          double normGiven = 0.0;
          for (int j = 0; j < jobVector.length; j++) {
            dotProduct += jobVector[j] * givenVector[j];
            normJob += jobVector[j] * jobVector[j];
            normGiven += givenVector[j] * givenVector[j];
          }
          normJob = Math.sqrt(normJob);
          normGiven = Math.sqrt(normGiven);
          double cosineSimilarity = dotProduct / (normJob * normGiven);
          return cosineSimilarity * (doc['skills.skill_level'].value / 10.0);
        """,
        "params": {
          "skill_vector": [0.3, 0.2, 0.1]
        }
      }
    },
    "score_mode": "max"
  }
}

Full Query
The bool query with should clauses ensures that both skill ID matches and vector similarity can apply. The boost_mode: “multiply” combines the different scoring functions multiplicatively to provide a final relevance score. This method allows us to calculate relevance scores efficiently across millions of records.

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "job_title": "engineer"
          }
        }
      ],
      "should": [
        {
          "function_score": {
            "query": {
              "nested": {
                "path": "skills",
                "query": {
                  "script_score": {
                    "script": {
                      "source": """
                        double[] jobVector = doc['skills.skill_vector'].value;
                        double[] givenVector = params.skill_vector;
                        double dotProduct = 0.0;
                        double normJob = 0.0;
                        double normGiven = 0.0;
                        for (int j = 0; j < jobVector.length; j++) {
                          dotProduct += jobVector[j] * givenVector[j];
                          normJob += jobVector[j] * jobVector[j];
                          normGiven += givenVector[j] * givenVector[j];
                        }
                        normJob = Math.sqrt(normJob);
                        normGiven = Math.sqrt(normGiven);
                        double cosineSimilarity = dotProduct / (normJob * normGiven);
                        return cosineSimilarity * (doc['skills.skill_level'].value / 10.0);
                      """,
                      "params": {
                        "skill_vector": [0.3, 0.2, 0.1]
                      }
                    }
                  }
                },
                "score_mode": "max"
              }
            },
            "functions": [
              {
                "exp": {
                  "job_location": {
                    "origin": "37.7749,-123.4194",
                    "scale": "50km",
                    "offset": "10km",
                    "decay": 0.5
                  }
                }
              }
            ],
            "boost_mode": "multiply",
            "score_mode": "sum"
          }
        }
      ]
    }
  }
}

Score Calculation for this Query
Lets take a example of below document to calculate the score using above query.

{
  "job_title": "Software Engineer",
  "job_location": "37.8749,-122.4194",  // San Francisco
  "skills": [
    {
      "skill_id": "skill_1",
      "skill_level": 8,
      "skill_vector": [0.3, 0.2, 0.1]
    },
    {
      "skill_id": "skill_2",
      "skill_level": 5,
      "skill_vector": [0.1, 0.2, 0.4]
    }
  ]
}

Step 1: TF-IDF Calculation

Term Frequency (TF): The term “engineer” appears once in the job_title "Software Engineer". TF("engineer")=1/2=0.5

Inverse Document Frequency (IDF): Assume the total number of documents is 10,000 and “engineer” appears in 1,000 of them. IDF(“engineer”)=log⁡(10000/1000)=log⁡(10)=1

TF-IDF Score: TF-IDF(“engineer”)=0.5×1=0.5

Step 2: Should Clauses — Script Score

Calculates the cosine similarity between the job’s skill vector and the given skill vector. Since the score_mode is max, the final score from the nested query will be the maximum of the cosine scores calculated for the skills:

Skill 1 Cosine Score: 0.8
Skill 2 Cosine Score: 0.322

Final Score from Nested Query: 0.8

Step 3: Applying Additional Functions

Location-Based Decay: The distance between the job location and the origin point is 20km. This query will apply the exponential decay function to documents based on their location relative to the origin. This calculation is intended for illustrative purposes; actual results may vary.

Adjusted Distance=max(0,distance−offset)=max(0,20−10)=10 km
Decay Function Score = e^(−Adjusted Distance/Scale) = e^(−10/50) = e^(−0.2) = 0.8187

Step 4: Combining Scores

As score mode is sum, we will sum all the score in function score. Combined Function Score=0.8+0.8187=1.6187.

Boost Mode: Multiply. The final score is calculated by combining the base TF-IDF score with the boosts from the function_score components.

Final Score = Base TF-IDF Score×Combined Function Score= 0.5×1.6187 ≈ 0.809
This calculation is performed on every document in the dataset. Documents with higher scores are deemed more relevant and are returned in descending order of their scores.

Conclusion

By leveraging Elasticsearch’s powerful function_score query and its core components such as decay functions, field value factors, and script scoring, we implemented a highly customized and effective relevance scoring system. This approach significantly improved the relevance and accuracy of our job search results.

Not only did we enhance the precision of our search results relevance, but we also reduced the search calculation time by over 60%, allowing for real-time query responses even under heavy loads.

Our implementation has scaled effortlessly to handle thousands of concurrent users, providing highly relevant job recommendations. I hope this provides insight into how we significantly enhanced the effectiveness of search, delivering exceptional value to our clients.

Tuning Search Relevance with Elasticsearch Custom Scoring Techniques

Transition to Elasticsearch

Customizing Job Search Relevance with Elasticsearch Function Score

Conclusion

Written by Samarendra Kandala