Replication with AWS Elasticsearch
AWS adapts Elastic’s open source search solution to power its own hosted search engine: AWS Elasticsearch. Since this is an adaptation of Elastic’s product, there are some key features missing, like built in replication. For a project at my job, we needed a replicated Elasticsearch domain. I was able to come up with a solution I would like to share with others who may have this same need.
AWS Resources:
S3, Lambda (I use Node Js for my lambda functions), AWS Elasticsearch (now called OpenSearch)
Quick Summary:
- Create two S3 buckets and two Elasticsearch domains. Create one of each in different AWS regions
- Create a lambda function which is triggered by a s3:ObjectCreated event. The function should use the AWS SDK and the Elasticsearch npm module to fetch your object from the bucket and add it to Elasticsearch
- Add a replication rule in your S3 bucket to replicate your objects in another S3 bucket in a different region
- Deploy your lambda function in the replicated region and have it triggered by a s3:ObjectCreated event from the replicated bucket. This lambda should be sending the documents to the Elasticsearch domain in the replicated region.
In Depth Solution
S3 Buckets and Replication
The S3 buckets are the key to my solution because they have replication built in. After creating both of the buckets in different regions, go to your host bucket and click on the management tab. Within the management tab you will see Replication Rules:
Create a rule that replicates everything in your host bucket to your replicated bucket. Note that you need to choose an IAM Role which has permissions to perform replication. The role I used had full S3 permissions.
Now, every time you add a document to your host bucket, it will be in your replicated bucket, triggering your lambda function.
Lambda Functions
For this project, I used the Serverless Framework: https://www.serverless.com/
This framework is very easy to set up, and it will make deploying your lambda function in multiple regions very easy. I won’t go over an entire tutorial on using the Serverless Framework, but I will describe how I set up my environment variables to deploy my lambda function to different regions. The general concepts of my solution can be replicated (haha) with your preferred lambda process.
The config file which defines your lambda function is called serverless.yml. In this file, you can create stages for your deployment. Each stage will have its own set of environment variables.
Typical examples of stages are development, testing, production, etc. For my solution, I added a replication stage:
Finally, you define your lambda function to be triggered by an s3 event using the bucket environment variable:
Once you are ready to deploy your function, you can use the serverless deploy command:
sls deploy --stage prod
sls deploy --stage replication
Your lambda function will now be created in both regions, adding documents to both of your Elasticsearch domains!
Conclusion
Hopefully this article helps you replicate your AWS Elasticsearch instance. Please comment if you can improve my solution or have a better solution!