AWS Data Pipeline for DynamoDB Backup to S3 — a Tiny Demonstration

AWS Data Pipeline is a web service that can process and transfer data between different AWS or on-premises services. Like Linux Cron job system, Data Pipeline can be scheduled to trigger at certain set intervals.

In this demonstration we will use AWS Data Pipeline to copy AWS DynamoDB items to an S3 bucket.

Image for post
Image for post

Step 1: Create a DynamoDB Table and Populate it

Create a DynamoDB table called DataPipeLineDemo with two attributes (fields).

region=us-east-1aws dynamodb --region $region create-table \
--table-name DataPipeLineDemo \
--attribute-definitions \
AttributeName=Artist,AttributeType=S \
--key-schema AttributeName=Artist,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=1

Wait for the table to be ready

aws dynamodb --region $region wait table-exists --table-name DataPipeLineDemo

Now, let us add one item (record) to this Music table.

aws dynamodb --region $region put-item \
--table-name DataPipeLineDemo \
--item '{ "Artist": {"S": "Acme Band"}, "SongTitle": {"S": "Happy Day"} }' \
--return-consumed-capacity TOTAL

Step 2: Create an S3 Bucket

This bucket will be used to copy the backed up DynamoDB Table by Data Pipeline process.

aws s3 --region $region mb s3://datapipelinedemo-sree

Let us now visit our AWS Console and come to Data Pipeline service.

Image for post
Image for post

Create a new pipeline.

Image for post
Image for post

Choose export DynamoDB template from the drop down.

Image for post
Image for post

Mention the output S3 bucket we have created and the DynamoDB table.

Image for post
Image for post

Schedule the pipeline job every 15 minutes (2 minutes seen in below did not work!)

Image for post
Image for post

Here is the visual representation of the pipeline definition.

Image for post
Image for post

I had to correct this error!

Image for post
Image for post

Click on the schedule item to exand and change the frequency from 2 to 15 minutes.

Image for post
Image for post

This how it looks now.

Image for post
Image for post

Now the status changed to “Waiting for Runner”

Image for post
Image for post

Job has been scheduled.

Image for post
Image for post

Job has completed and the status has become “Finished”.

Image for post
Image for post

If we look at the bucket content now you will see the DynamoDB data!

Image for post
Image for post

Some of the AWS CLI Data Pipeline commands

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Cleaning Up

Image for post
Image for post

Thanks for your time. Do follow for such tiny demonstrations!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store