Migrating Data from MongoDB to AWS DocumentDB

Govind Kumar
axcess.io
Published in
5 min readMar 21, 2023

Migrating data from MongoDB to AWS DocumentDB can be a time-consuming and complex process. But with the right tools and steps, you can simplify the process and make it easier. In this blog post, we will show you how to migrate data from MongoDB to AWS DocumentDB using AWS CLI commands, as well as how to set up AWS DocumentDB using a CloudFormation template.

Generating Sample Data in MongoDB

Before we can begin the migration process, we need to have some sample data in our MongoDB database. We will use a Python script to generate sample data in MongoDB. Here are the steps to generate sample data in MongoDB:

  1. Install the pymongo Python package using the following command:
pip install pymongo

2. Copy the following Python code into a new file and save it as “generate_data.py”:

import random
import string
import pymongo

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["mytestdb"]
collection = db["mycollection"]

for i in range(1000):
data = {}
data['name'] = ''.join(random.choices(string.ascii_uppercase + string.digits, k=10))
data['age'] = random.randint(1, 100)
data['address'] = ''.join(random.choices(string.ascii_uppercase + string.digits, k=20))
collection.insert_one(data)

print("Data generation complete")

This script will generate 1000 sample documents in a MongoDB collection named “mycollection” in the “mytestdb” database. The documents will contain three fields: “name”, “age”, and “address”.

3. Run the following command to execute the Python script and generate sample data in MongoDB:

python generate_data.py

Create a DocumentDB Cluster using the following template:

Step 1: Create a CloudFormation Stack

First, we will create a CloudFormation stack to set up the DocumentDB cluster. You can use the following CloudFormation template to create the stack:

AWSTemplateFormatVersion: '2010-09-09'
Description: 'AWS CloudFormation Sample Template for Amazon DocumentDB'

Resources:
DocumentDBSubnetGroup:
Type: AWS::DocDB::DBSubnetGroup
Properties:
DBSubnetGroupName: DocDB-Subnet-Group
SubnetIds:
- <Subnet1-ID>
- <Subnet2-ID>
DBSubnetGroupDescription: "Subnet Group for Amazon DocumentDB"

DocumentDBSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupName: DocumentDB-SG
GroupDescription: "Security Group for Amazon DocumentDB"
VpcId: <VPC-ID>
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 27017
ToPort: 27017
CidrIp: 0.0.0.0/0

DocumentDBCluster:
Type: AWS::DocDB::DBCluster
Properties:
AvailabilityZones:
- <AZ1>
- <AZ2>
DBClusterIdentifier: my-docdb-cluster
MasterUsername: <Username>
MasterUserPassword: <Password>
VpcSecurityGroupIds:
- !Ref DocumentDBSecurityGroup
DBSubnetGroupName: !Ref DocumentDBSubnetGroup

Replace <Subnet1-ID>, <Subnet2-ID>, <VPC-ID>, <AZ1>, <AZ2>, <Username>, and <Password> with the appropriate values.

Save this template as documentdb-cfn.yml. Now we can create the stack using the AWS CLI command:

aws cloudformation create-stack --stack-name docdb-stack --template-body file://documentdb-cfn.yml

Step 2: Set up the AWS DMS Replication Instance

Next, we need to set up the AWS DMS replication instance. You can use the following AWS CLI commands to set up the instance:

aws dms create-replication-instance --replication-instance-identifier my-dms-instance --allocated-storage 50 --engine-version 3.3.2 --no-multi-az --replication-instance-class dms.t2.micro --vpc-security-group-ids <Security-Group-ID> --availability-zone <Availability-Zone> --no-publicly-accessible --no-auto-minor-version-upgrade

Replace <Security-Group-ID> and <Availability-Zone> with the appropriate values.

Step 3: Create the Source and Target Endpoints

Now we need to create the source and target endpoints. You can use the following AWS CLI commands to create the endpoints:

aws dms create-endpoint --endpoint-identifier mongodb-endpoint --endpoint-type mongodb --engine-name mongodb --server-name <MongoDB-Server-Name> --port <MongoDB-Port> --database-name <MongoDB-Database-Name> --username <MongoDB-Username> --password <MongoDB-Password>

aws dms create-endpoint --endpoint-identifier docdb-endpoint --endpoint-type target --engine-name docdb --server-name <DocumentDB-Cluster-Endpoint> --port 27017 --database

Step 4: Start the DMS Replication Task

aws dms create-replication-task --replication-task-id <task-id> --source-endpoint-arn <source-endpoint-arn> --target-endpoint-arn <target-endpoint-arn> --migration-type full-load-and-cdc --table-mappings file://mapping.json

aws dms start-replication-task --replication-task-arn <task-arn>

Note: Replace <task-id>, <source-endpoint-arn>, <target-endpoint-arn>, <task-arn>, and mapping.json with appropriate values.

Step 5: Verify data integrity after migrating data from MongoDB to AWS DocumentDB:

mongo --host=<MongoDB endpoint> --ssl --sslAllowInvalidCertificates --authenticationDatabase=admin --username=<username> --password=<password> --eval "db.<collection_name>.find()" | wc -l

Replace <MongoDB endpoint>, <username>, <password>, and <collection_name> with the appropriate values for your MongoDB instance. This script will count the number of documents in the specified collection.

To verify the data in AWS DocumentDB, run the following command:

aws documentdb describe-db-instances --db-instance-identifier <db_instance_name> | grep Address

Replace <db_instance_name> with the name of your DocumentDB instance. This command will output the endpoint for your DocumentDB instance.

Then, run the following command to connect to the instance:

mongo --ssl --host=<DocumentDB endpoint> --sslCAFile=rds-combined-ca-bundle.pem --username=<username> --password=<password> --authenticationDatabase=admin --tls --tlsAllowInvalidHostnames

Replace <DocumentDB endpoint>, <username>, and <password> with the appropriate values for your DocumentDB instance. This command will connect to the DocumentDB instance using SSL/TLS and the appropriate credentials.

Finally, run the following command to count the number of documents in the collection:

use <database_name>
db.<collection_name>.count()

Replace <database_name> and <collection_name> with the appropriate values for your DocumentDB instance. This command will count the number of documents in the specified collection in DocumentDB.

Compare the results from both the MongoDB and DocumentDB collections to ensure that the data was migrated correctly.

Challenges:

While migrating from MongoDB to DocumentDB, there are a few key issues that you may face. Here are some of them:

  1. Differences in data types: DocumentDB is compatible with MongoDB, but there are some differences in data types. For example, DocumentDB does not support all BSON data types. It is important to check the data types before migrating to avoid data loss.
  2. Differences in indexing: DocumentDB has some limitations on the number of indexes that can be created. If your MongoDB database has a large number of indexes, you may need to rethink your indexing strategy before migrating.
  3. Differences in query optimization: DocumentDB uses a different query optimizer than MongoDB. This means that queries that perform well in MongoDB may not perform as well in DocumentDB. It is important to test your queries in DocumentDB to ensure that they are optimized for performance.
  4. Differences in transaction support: DocumentDB supports transactions, but there are some limitations. For example, transactions are limited to a single partition. If your MongoDB database has complex transactions, you may need to rethink your transaction strategy before migrating.
  5. Differences in scaling: DocumentDB uses a different scaling model than MongoDB. DocumentDB uses a partitioned architecture, where data is distributed across multiple nodes. This means that you may need to rethink your sharding strategy before migrating.

Conclusion:

In conclusion, migrating data from MongoDB to AWS DocumentDB can be a complex process, but with the right tools and strategies, it can be accomplished smoothly and efficiently. The AWS Database Migration Service (DMS) provides a reliable and flexible solution for moving data from MongoDB to DocumentDB, while minimizing downtime and preserving data integrity.

However, there are several challenges to consider, such as schema and feature differences between MongoDB and DocumentDB, network latency, and data consistency. These challenges can be mitigated by using best practices such as testing, monitoring, and tuning the migration process, as well as selecting the appropriate migration strategy and tools.

In this blog, we have provided a step-by-step guide on how to migrate data from MongoDB to DocumentDB using the AWS Database Migration Service (DMS) and the AWS CLI. We have also discussed the key differences between MongoDB and DocumentDB, and the main challenges and considerations for a successful migration.

By following the guidelines and best practices presented in this blog, you can minimize the risks and maximize the benefits of migrating your data to AWS DocumentDB, a fully managed, scalable, and highly available document database service that can help you unlock new insights and value from your data.

--

--

Govind Kumar
axcess.io

Technology Evangelist | AWS Golden Jacket | Practice Lead Cloud Migration @Axcess IO | Cloud Arch. | RHC(SA/E) | AWS (DevOps/Sol. Arch) — Pro. | CCNA