AWS Data Pipeline copy from RDS MySQL to S3 and Back

Published in

ADIBI Technologies, Basavanagudi, Blore.

4 min readFeb 27, 2019

Sometimes, it is necessary to copy the data between different AWS compute and storage services, or on-premises data sources. It could be the user require to backup the data to a durable and cost effective medium like S3, or it could be the user need to copy the text based data from S3 to databases like RDS, Redshift or dynamodb. AWS Data Pipeline one of the solutions which can be used automate the movement and transformation of data. During the process you can also see creation of intermediate resource like launching of EMR cluster or EC2 instances triggered by the AWS Data pipeline components Task Runner and Pipeline Definitions.

Here we experiment how AWS Datapipeline can be used for copying RDS MySQL table to S3 in .csv format and copy the same .csv file back to to another RDS MySQL instance. We will create two RDS MySQL instances one with name srv and other dst. The srv RDS instance has a database mydb and table named emp within it, which has only two fields. For the purpose of experiment two records are inserted into this source table emp. The dst RDS instance has one database with name bank (which is not default database created at the time of RDS instance- for this reason we do some additional configuration in the experiment — see configuration below). Also we are not using default VPC which again require additional configuration which also is documented here.

First Data pipeline should be able to pull data from emp table in the mydb database of srv instance and store in an S3 bucket (dpl11) in the form of .csv file. The second Data pipe line c should push contents of this .csv file to emp table in the bank database of dst RDS instance.

AWS Region: N Virginia
Number of AWS Data Pipelines: 2
Source RDS Instance name: srv
Source database name(default): mydb
Source table name: emp
emp table structure: empname varchar(20), address varchar(20)
Number of records in emp: 2
Destination  RDS Instance name: dst
Destination database name(not default and exists): bank
Destination table name (doesn't exist): emp
S3 bucket name: dpl11
S3 File format: .csv
VPC : not default

Configuration Steps ( Copy RDS MySQL to S3)

In AWS Web console search for AWS Data pipeline

2) Click Create a new pipeline. Enter name of the pipeline and other information regarding the source RDS MySQL instance like below

3) We don't schedule the task now. Select Run on pipeline activation

4) Select Logging to Disabled for this basic configuration, Else you may do.

5) Select IAM Role to Default.

6) Since I don't use Default VPC I have to do this step. Else can be skipped. Click Edit in Architect, Under Resources section, click Add an optional Field dropdown list, enter a subnet ID of your VPC (for intermediate EC2 instance responsible for executing the task of copying data from RDS to S3).

7) Click Save and then Activate (You might ignore the warning related to logging).

8) When control takes to Data pipeline Execution Details page, wait till the Status turns Finish

9) Testing: Now verify that contents of emp table is copied to the S3 bucket dpl11 as .csv file.

Configuration Steps ( S3 to Copy RDS MySQL )

10) Create a new Data pipeline and enter the information like below screenshot S3-To-RDS. Now the source is .csv file of dpl11 S3 bucket created from the previous step. Repeat the above steps 3–5.

11) Since the database(bank) where the emp table to be copied is not created at the time of creation of the RDS instance I have to do following step. Click Edit in Architect and under Others section click Add an optional Field dropdown list and enter database name like below. Under Resources section, click Add an optional Field dropdown list, enter a subnet ID in your VPC ( refer step 6 for more info)

12) Click Save and then Activate.

13) When control takes to status page, wait till the status turns Finish

14) Testing: Now logon to your destination MySQL RDS instance make verify that emp table is created in the bank database. The records should have same value present in the .csv file of S3 bucket or emp table in the source MySQL RDS instance srv.

Disclaimer: This experiment is accomplished in test environment. You are requested to verify your setup thoroughly before you implement in a production environment. Some of the screenshots may not reflect exactly as the original AWS web console due to mangled copying and joining of multiple screenshots to form a single one. You may have to refer the original AWS console.

Related Readings

Blog - Index - Adibi Technologies

Edit description

www.adibitech.com

AWS Global Accelerator- Avoid DNS Cache issue.

AWS Global Accelerator is network layer service, which project itself in front of your configured internet…

medium.com