Index AWS DynamoDB Data on Elasticsearch/Kibana using AWS Lambda
Happy serverless
Background
DynamoDB
providers a document database that has high scalability. It is kind of a similar database to Apache Cassandra
. The main querying capabilities of DynamoDB are centered around lookups using a primary key and not provides full-text search. We can index the DynamoDB data on Elasticsearch to achieve full-text search feature with the DynamoDB data. Amazon OpenSearch providers fully-managed secure Elasticsearch service that it easy to deploy and manage. In this post I’m gonna discuss about indexing the content of the DynamoDB tables to an Amazon Elasticsearch Service
, using the DynamoDB Streams
with AWS Lambda
. Then I have connected Kibana
to the Elasticsearch index to do data visualization. Following figure shows the architecture of the system.
1. Create DynamoDB Table
First I have created the DynamoDB table document-db
with primary key column id
. Then enabled DynamoDB Stream in the database table. It allows to capture item-level changes in the table, and push the changes to a DynamoDB Stream
. Then we can access the change information through the DynamoDB Streams API
(e.g capture changes via Lambda functions).
2. Create IAM Role and Policy
Then I have created an IAM role document-stream-role
that will be assigned to the Lambda function to read the data from DynamoDB Streams and publish them in the Elasticsearch. During the role creation, it will be asked to make a policy for the role. The role must have basic Amazon ES, DynamoDB, and Lambda execution permissions. I have created a policy document-stream-policy
and given the following permissions to it.
When creating the document-stream-role
role I have selected the role type as Lambda. Then assigned the document-stream-policy
to the document-stream-role
.
3. Create AWS Elasticsearch Service
AWS Elasticsearch instance can be created with EWS OpenSearch dashboard. In the OpenSearch dashboard there is an option called create domain
to create Elasticsearch Service. I have named the domain as document-index-domain
, the deployment type selected as Development and Testing
, version selected as Elasticsearch 7.8
. In the instance type I have selected smallest instant type t3.small.search
(since this is a testing version).
Next important thing is access policy, I have selected the Network to Public access
(via internet) since I’m doing a demo. The recommended way it to use the VPC access. The access policy defined as Domain level access policy with following JSON configuration. In here I have obtained my local computers public IP( 70.104.177.55
) and restrict permission to access the Elasticsearch service only via that IP. You can find the public IP of your local machine via this URL.
Once created the domain, it will enable end point for Elasticsearch service(Domain endpoint
) and Kibana dashboard(Kibana URL
). This Elasticsearch service endpoint used in the Lambda function to index the data.
4. Create AWS Lambda Function
Next I have created AWS Lambda function to take the DynamoDB Stream from the DynamoDB database table and index on the Elasticsearch. I have specified the name of the function as document-stream-lambda
, language selected as Python 3.8
. Then assigned the previously document-stream-role
permission to the Lambda function.
After creating the Lambda function, I have added a trigger to the function from the DynamoDB table document-db
. This trigger will invoke Lambda function when document-db
table data get updated/created/deleted.
Then in the local machine I have created Python Lambda function document-stream.py
to publish the data into the Elasticsearch. The program defines Elasticsearch host
, index name
, document type
. The handler function takes the data coming from DynamoDB stream and publish in the Elasticsearch. This program written in a directory called document-stream
. The dependencies of the program(bot3
, requests_aws4auth
, requests
) installed with pip3
inside the same document-stream
folder.
Finally I have zipped the document-stream
folder, uploaded into the Lambda function in the AWS Lambda dashboard and deploy the Lambda function.
Next thing is to define the handler in the Lambda function. In the Runtime settings
of the Lambda function I have defined the handler name. In this case, the Handler is the handler
function of the document-stream.py
program.
5. Test Elasticsearch Data
Now everything is setup to stream the DynamoDB data in the Elasticsearch. So when adding data in the DynamoDB table document-db
, the data should be automatically indexed in the Elasticsearch index documents
. I have added few records into the DynamoDB table from the AWS dashboard.
These data should be indexed in the Elasticsearch. Following is the way to search the Elasticsearch and view the data.
6. Setup Kibana
The final setup is the configure Kibana with the Elasticsearch index. The Kibana URL is available in the AWS OpenSearch dashboard. I have browsed to that URL, setup the index pattern(documents
) and created some visualizations as below. In here the Kibana use open access, I can restrict the access of the the Kibana using AWS Cognito user pool. For demo purpose I have used the Open access permission.
Reference
- https://www.bogotobogo.com/DevOps/AWS/aws-Amazon-Loading-DynamoDB-Stream-to-ElasticSearch-with-Lambda.php
- https://medium.com/swlh/loading-data-to-aws-elasticsearch-with-dynamodb-streams-and-lambda-77e52b9c797
- https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/
- https://towardsaws.com/streaming-your-dynamodb-data-to-elasticsearch-for-enhanced-analytics-24862f67f09c
- https://www.cnblogs.com/Answer1215/p/14783470.html