Searching into multiple indices elasticsearch

christian rojas trujillo
elasticsearch for dummies
4 min readApr 20, 2023

The Basics

Elasticsearch is an open-source search and analytics engine based on the Apache Lucene library. It is designed to provide a distributed, multi-tenant capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is used in a wide range of applications, including website search, log analysis, and data analytics. It is known for its scalability, speed, and ease of use, and is widely used by developers and businesses around the world. Some of the companies who using this tecnology are: Uber, Udemy, Slack, etc….

In this document i will present a real world scenario about how to search document in many indices or schemas. will discuss about three possible approaches: using the multi search API, using the _reindex and the creation of new index with homologous fields.

Let's say we have 3 indexes: sedan, van and suv and we want to search by model and brand. however, in each schema we have different field names to expose those attributes, which can be a problem when building a unified query, since elastic has 4 types of boolean queries: must, filter, should and must_not. when we want to create a strict query we cannot target different types of fields looking for the same thing. ie: { "vanModel": "audi", "sedanModel": "audi", "suvModel": "audi" } (yes, we can see these kind of things in real world because business layer needs and cost of refactoring)

{
"query": {
"bool": {
"must": [
{
"term": {
"vanModel": {
"value": "audi"
}
}
},
{
"term": {
"sedanModel": {
"value": "audi"
}
}
},
{
"term": {
"suvModel": {
"value": "audi"
}
}
}
]
}
}
}

Options:

  • Use multi search API
  • _reindex and merge
  • create new index with homologues fields

multi search API strategy

The Multi-search API in Elasticsearch is a powerful feature that allows you to execute multiple searches with a single API request. This is particularly useful when you need to retrieve data from multiple indices or schemas. Rather than making separate requests for each index, you can use the Multi-search API to consolidate your queries into a single request, which can significantly reduce the amount of time and resources needed to retrieve the desired data.

In addition to its efficiency, the Multi-search API also provides flexibility in terms of the types of queries you can perform. You can use it to execute a wide range of queries, including simple queries, compound queries, and aggregation queries. This means that you can use the Multi-search API to retrieve data that meets complex criteria, without having to write complex queries or make multiple requests.

Overall, the Multi-search API is a valuable tool in Elasticsearch that can help you streamline your data retrieval processes and improve the efficiency of your applications.

Problem:

  • The pagination could cause mismatch
  • the filters of the query will not strict if we want to
  • The score will vary from one index to another

_reindex strategy

reindex and merge is another strategy to deal with multiple indices in Elasticsearch. This involves creating a new index and using the reindex API to copy documents from the old indices into the new one. This can be particularly useful when you want to merge multiple indices with different mappings into a single, unified index. However, this approach can be more resource-intensive, especially if you have large volumes of data.

The reindex and merge strategy offers some advantages over the multi search API strategy. For example, it can help to reduce the complexity of your data retrieval processes, as you can consolidate multiple indices into a single index with a unified mapping. This can make it easier to search and analyze your data, as you won't need to worry about differences in the mappings or structures of the different indices.

Problem:

  • If we don’t have indices with approved fields we are not sure if the result will be optimal.

new index strategy

Creating a new index with homologous fields is another approach that can be used to deal with multiple indices in Elasticsearch. This strategy involves creating a new index with fields that are similar to those in the existing indices, and then using the reindex API to copy data from the old indices into the new one. This approach can help to ensure that the resulting search results are optimal, as the new index will have fields that are designed to work well with the search queries.

So, only in this scenario we can make work the query in strict mode (must) ang get the results that we want

{
"query": {
"bool": {
"must": {
"term": {
"model": {
"value": "audi"
}
}
}
}
}
}

Overall, the choice between the multi search API, the reindex and merge strategy, and the new index strategy will depend on your specific needs and requirements. You may want to consider factors such as the size and complexity of your data, the types of queries you need to perform, and the resources available to you when making this decision.

--

--