AWS Data Pipeline for DynamoDB Backup to S3 — a Tiny Demonstration

AWS Data Pipeline is a web service that can process and transfer data between different AWS or on-premises services. Like Linux Cron job system, Data Pipeline can be scheduled to trigger at certain set intervals.

In this demonstration we will use AWS Data Pipeline to copy AWS DynamoDB items to an S3 bucket.

Create a DynamoDB table called DataPipeLineDemo with two attributes (fields).

region=us-east-1aws dynamodb --region $region create-table \
--table-name DataPipeLineDemo \
--attribute-definitions \
AttributeName=Artist,AttributeType=S \
--key-schema AttributeName=Artist,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=1

Wait for the table to be ready

aws dynamodb --region $region wait table-exists --table-name DataPipeLineDemo

Now, let us add one item (record) to this Music table.

aws dynamodb --region $region put-item \
--table-name DataPipeLineDemo \
--item '{ "Artist": {"S": "Acme Band"}, "SongTitle": {"S": "Happy Day"} }' \
--return-consumed-capacity TOTAL

This bucket will be used to copy the backed up DynamoDB Table by Data Pipeline process.

aws s3 --region $region mb s3://datapipelinedemo-sree

Let us now visit our AWS Console and come to Data Pipeline service.

Create a new pipeline.

Choose export DynamoDB template from the drop down.

Mention the output S3 bucket we have created and the DynamoDB table.

Schedule the pipeline job every 15 minutes (2 minutes seen in below did not work!)

Here is the visual representation of the pipeline definition.

I had to correct this error!

Click on the schedule item to exand and change the frequency from 2 to 15 minutes.

This how it looks now.

Now the status changed to “Waiting for Runner”

Job has been scheduled.

Job has completed and the status has become “Finished”.

If we look at the bucket content now you will see the DynamoDB data!

Some of the AWS CLI Data Pipeline commands

Thanks for your time. Do follow for such tiny demonstrations!

AWS Certified DevOps Engineer & Solutions Architect Professional — Docker | Kubernetes | DevOps — Trainer | Running | Swimming | Cycling

AWS Certified DevOps Engineer & Solutions Architect Professional — Docker | Kubernetes | DevOps — Trainer | Running | Swimming | Cycling