Elasticsearch Cross Cluster: Multi-Region Deployment

Pradnya
5 min readFeb 21, 2021

--

Elasticsearch Search Engine

Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic). Known for its simple REST APIs, distributed nature, speed, and scalability, Elasticsearch is the central component of the Elastic Stack, a set of open source tools for data ingestion, enrichment, storage, analysis, and visualization. Commonly referred to as the ELK Stack (after Elasticsearch, Logstash, and Kibana), the Elastic Stack now includes a rich collection of lightweight shipping agents known as Beats for sending data to Elasticsearch.

Going Forward we will take a look at how the data is stored in Elasticsearch and the concept of Cross Cluster Search before going into a deep dive of how to setup Cross Cluster Search in Elasticsearch (ES).

Understanding the storage of Data in Elasticsearch

Data is stored in form of Index in ES. An index can be viewed as a collection of documents in JSON format. It uses a data-structure called inverted index which is designed to allow very fast full-text searches.

An inverted index consists of the following two lists:

  • A list of all the unique words that appears in any document
  • For every word, a list of the documents in which it appears

When data is stored in documents it is indexed and fully searchable in near real-time within 1 second.

Elasticsearch enables you to define mapping for documents are stored. Mapping is the process of defining how the document and field it contains are stored and indexed.

For example we can define mapping to:

  • which field is string, date, number
  • which field should be treated as full-text search etc.

Cross Cluster Search

Cross cluster search feature of Elasticsearch lets you run a single query on multiple clusters which are connected to each other remotely. It queries all the remote clusters and brings the cumulative query result. This remote connection created between the two Elasticsearch clusters serves as a unidirectional traffic flow, hence while remotely connecting a cluster a connection is added on both clusters to point each others seed host active on port 9300.

Let’s see how to setup this cross cluster configurations on two Elasticsearch cluster.

Cross Cluster Connections
  • Create two Elasticsearch cluster, one in US region and other in APAC region. The designing of cross cluster connection will remain the same even if two clusters are present in same region.
  • Expose a transport layer service in both cluster on port 9300 such that the services are accessible to both the clusters
  • We can create uni-directional connection or multi-directional. In uni-directional connection you can run cross cluster query from only one of the cluster. In above diagram the connection between Cluster: US and Cluster: EU is unidirectional hence the cross cluster query can only be run from Cluster: US whereas the connection between Cluster: US and Cluster: APAC is multi-directional hence cross cluster query can be run from both the cluster.
  • Let us take a look at how to create multi-directional cross cluster connection. If you are using Kibana to monitor your Elastic search cluster then go to Kibana interface and click Stack Management from Management section in right panel
Stack Management
  • Now Navigate to Remote Clusters and click on Add a Remote Cluster button
  • Add the IP address and port of the transport service which is exposed on remote cluster. Once the connection is reachable then the Status will change to Connected
Remote Cluster Created on First Cluster
  • Follow the same procedure on second cluster and add the IP address and port of first cluster as remote connection to the second cluster
Remote Cluster Created on Second Cluster
  • The same settings can be done from Elasticsearch cluster by running the below query
# On Cluster One
PUT _cluster/settings
{
"persistent": {
"cluster": {
"remote": {
"cluster-two": {
"skip_unavailable": false,
"mode": "sniff",
"proxy_address": null,
"proxy_socket_connections": null,
"server_name": null,
"seeds": [
"172.17.0.2:9300"
],
"node_connections": 3
}
}
}
}
}
# On Cluster Two
PUT _cluster/settings
{
"persistent": {
"cluster": {
"remote": {
"cluster-one": {
"skip_unavailable": false,
"mode": "sniff",
"proxy_address": null,
"proxy_socket_connections": null,
"server_name": null,
"seeds": [
"10.128.0.8:9300"
],
"node_connections": 3
}
}
}
}
}
  • After both the connections are established, now you will be able to run a cross cluster query as follows
GET test-index,cluster-two:test-index/_search
{
"query": {
"match_all": {}
}
}
Output:

"took" : 302,
"timed_out" : false,
"num_reduce_phases" : 3,
"_shards" : {
"total" : 2,
"successful" : 2,
"skipped" : 0,
"failed" : 0
},
"_clusters" : {
"total" : 2,
"successful" : 2,
"skipped" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}

If you run the above query on cluster one you are able to search index test-index and bring results from both local and remote cluster with cluster connection name as cluster-two.

Note that index name should be present on both clusters to perform cross cluster query.

Congratulations!! You are ready to perform a cross cluster query on Elasticsearch cluster.

--

--