Combining the Best of Both Worlds: Hybrid Search in Elasticsearch with BM25 and HNSW
When it comes to search algorithms, there is no one-size-fits-all solution. Different algorithms work better in different scenarios, and sometimes a combination of algorithms is needed to achieve the best results. In Elasticsearch, one popular approach to combining search algorithms is to use a hybrid search, combining the BM25 algorithm for text search with the HNSW algorithm for nearest neighbor search. In this blog post, we’ll explore the benefits, challenges, and use cases of hybrid search in Elasticsearch.
BM25 is a widely used algorithm for text search that calculates a score based on the term frequency and inverse document frequency of each term in the query. HNSW, as we saw in the previous blog post, is an algorithm for approximate nearest neighbor search that constructs a small world graph of interconnected nodes. By combining these two algorithms, we can perform hybrid search that combines the strengths of both.
One of the biggest challenges of hybrid search is balancing the weights of the two algorithms. In other words, we need to decide how much weight to give to the BM25 score and how much weight to give to the HNSW score when combining them. This can be tricky, as the optimal weights may vary depending on the data and the specific search scenario.