Index AWS DynamoDB Data on Elasticsearch/Kibana using AWS Lambda

Happy serverless

(λx.x)eranga
Effectz.AI
6 min readAug 17, 2022

--

Background

DynamoDB providers a document database that has high scalability. It is kind of a similar database to Apache Cassandra. The main querying capabilities of DynamoDB are centered around lookups using a primary key and not provides full-text search. We can index the DynamoDB data on Elasticsearch to achieve full-text search feature with the DynamoDB data. Amazon OpenSearch providers fully-managed secure Elasticsearch service that it easy to deploy and manage. In this post I’m gonna discuss about indexing the content of the DynamoDB tables to an Amazon Elasticsearch Service, using the DynamoDB Streams with AWS Lambda. Then I have connected Kibana to the Elasticsearch index to do data visualization. Following figure shows the architecture of the system.

1. Create DynamoDB Table

First I have created the DynamoDB table document-db with primary key column id. Then enabled DynamoDB Stream in the database table. It allows to capture item-level changes in the table, and push the changes to a DynamoDB Stream. Then we can access the change information through the DynamoDB Streams API(e.g capture changes via Lambda functions).

2. Create IAM Role and Policy

Then I have created an IAM role document-stream-role that will be assigned to the Lambda function to read the data from DynamoDB Streams and publish them in the Elasticsearch. During the role creation, it will be asked to make a policy for the role. The role must have basic Amazon ES, DynamoDB, and Lambda execution permissions. I have created a policy document-stream-policy and given the following permissions to it.

When creating the document-stream-role role I have selected the role type as Lambda. Then assigned the document-stream-policy to the document-stream-role.

3. Create AWS Elasticsearch Service

AWS Elasticsearch instance can be created with EWS OpenSearch dashboard. In the OpenSearch dashboard there is an option called create domain to create Elasticsearch Service. I have named the domain as document-index-domain, the deployment type selected as Development and Testing, version selected as Elasticsearch 7.8. In the instance type I have selected smallest instant type t3.small.search(since this is a testing version).

Next important thing is access policy, I have selected the Network to Public access(via internet) since I’m doing a demo. The recommended way it to use the VPC access. The access policy defined as Domain level access policy with following JSON configuration. In here I have obtained my local computers public IP( 70.104.177.55) and restrict permission to access the Elasticsearch service only via that IP. You can find the public IP of your local machine via this URL.

Once created the domain, it will enable end point for Elasticsearch service(Domain endpoint) and Kibana dashboard(Kibana URL). This Elasticsearch service endpoint used in the Lambda function to index the data.

4. Create AWS Lambda Function

Next I have created AWS Lambda function to take the DynamoDB Stream from the DynamoDB database table and index on the Elasticsearch. I have specified the name of the function as document-stream-lambda, language selected as Python 3.8. Then assigned the previously document-stream-role permission to the Lambda function.

After creating the Lambda function, I have added a trigger to the function from the DynamoDB table document-db. This trigger will invoke Lambda function when document-db table data get updated/created/deleted.

Then in the local machine I have created Python Lambda function document-stream.py to publish the data into the Elasticsearch. The program defines Elasticsearch host, index name, document type. The handler function takes the data coming from DynamoDB stream and publish in the Elasticsearch. This program written in a directory called document-stream. The dependencies of the program(bot3, requests_aws4auth, requests) installed with pip3 inside the same document-stream folder.

Finally I have zipped the document-stream folder, uploaded into the Lambda function in the AWS Lambda dashboard and deploy the Lambda function.

Next thing is to define the handler in the Lambda function. In the Runtime settings of the Lambda function I have defined the handler name. In this case, the Handler is the handler function of the document-stream.py program.

5. Test Elasticsearch Data

Now everything is setup to stream the DynamoDB data in the Elasticsearch. So when adding data in the DynamoDB table document-db, the data should be automatically indexed in the Elasticsearch index documents. I have added few records into the DynamoDB table from the AWS dashboard.

These data should be indexed in the Elasticsearch. Following is the way to search the Elasticsearch and view the data.

6. Setup Kibana

The final setup is the configure Kibana with the Elasticsearch index. The Kibana URL is available in the AWS OpenSearch dashboard. I have browsed to that URL, setup the index pattern(documents) and created some visualizations as below. In here the Kibana use open access, I can restrict the access of the the Kibana using AWS Cognito user pool. For demo purpose I have used the Open access permission.

Reference

  1. https://www.bogotobogo.com/DevOps/AWS/aws-Amazon-Loading-DynamoDB-Stream-to-ElasticSearch-with-Lambda.php
  2. https://medium.com/swlh/loading-data-to-aws-elasticsearch-with-dynamodb-streams-and-lambda-77e52b9c797
  3. https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/
  4. https://towardsaws.com/streaming-your-dynamodb-data-to-elasticsearch-for-enhanced-analytics-24862f67f09c
  5. https://www.cnblogs.com/Answer1215/p/14783470.html

--

--