ELT using Fivetran: Load data from RDS Postgres to Snowflake using Fivetran.
This post is about the Extract and Load phases of the ELT process.
Basically, we shall try to see the options available in Fivetran and how easy is it to transfer data from the source to the destination.
Scenario:
We have our source Data in AWS RDS Postgres and the business requirement is to make the data available in a Data Warehouse (here Snowflake) to enable further Analysis.
Solution:
We shall use the Fivetran data integrator to Extract and Load data as-is from the source DB (Postgresql) to the destination DB (Snowflake).
We are not performing any data transformation operations here. It is a simple extract and load-as-is operation.
Extraction & Loading process flow diagram
Below is a representation of the Extraction and Loading process in ELT.
Note: Haven’t included the Transformation process to keep this simple. I will explain Transformation separately.
The organization environment consists of both source and destination. Fivetran Cloud is a separate entity that does the extraction of source data and loads it into the destination DB.
The major activity that Fivetran is being used here is the datatype conversion from Postgres to Snowflake, continuous sync of data, and in a secure way.
A Connector is the main component in Fivetran that does the above-mentioned activity.
About Source Data
Schema:
Below is the schema representation of the source.
The sample schema represents a relational structure with Primary and Foreign key constraints and one-to-many mappings among the tables.
Although the destination i.e. Snowflake doesn’t impose Primary and Foreign key constraints but maintains the constraints for perusal.
Below is the list of tables that we want to extract and load.
Step 1: Create Destination in Fivetran
Snowflake: Create a new DB in Snowflake.
create DATABASE demo_db_fivetran;
Login to Fivetran account,
Provide Destination details — In our case, the destination is Snowflake provide your Snowflake credentials to log in and the DB name where the data has to be loaded.
Step 2: Create a connector in Fivetran
Click ‘Add Connector’ and select ‘Postgres RDS’
Destination Prefix Schema:
This is the prefix using which a new schema gets created in destination (Snowflake). It’s a Constant and cannot be altered later.
I provided ‘PG_RDS_’ as the prefix.
Host: Provide the ‘Endpoint’ from your RDS account
Port: As in your AWS RDS.
Provide the remaining credential details.
Connection Method: Select ‘Connect Directly’
Update Method: Select ‘Detect Changes via XMIN’ (I will explain more while dealing with CDC — Change Data Capture)
Whitelist Fivetran’s IPs: In case of firewall is enabled in your environment, you should whitelist the provided IP addresses. This enables Fivetran to get connected with source and destination DB’s in your organization.
SAVE & TEST
Note: Please talk to your DBA and choose the appropriate TLS certificate for your organization.
Get Successful Connection message:
Click Continue
Fivetran fetches your Database tables information from Postgres RDS
Select the Tables & Columns for sync from the list.
Congratulations!
Your data is ready to sync!
Verify your Snowflake account before ‘Start Initial Sync’
Note: There are two default schemas in Snowflake DB.
Once I execute the ‘Inital Sync’, I expect a new Schema with prefix ‘pg_rds_’ to get automatically created.
Note: For demo purposes, we are using ‘Account Admin’ for all activities. In the prod environment, we have to follow the proper RBAC framework and assign restrictive roles to perform this creation action.
Fivetran dashboard:
Verify your Snowflake Account for the new Schema.
Setting Sync-Frequency:
Visit Setup and set the Sync-Frequency.
Verify the Table structure:
Check for Constraints — Primary and ForeignKey references,
Check the new columns that added for each table, and
Check the new table that got added to the Schema in addition to the tables from the source.
New COLUMNS in each Table
_FIVETRAN_DELETED BOOLEAN,
_FIVETRAN_SYNCED TIMESTAMP_TZ(9)
New Table
FIVETRAN_AUDIT
Summary:
Using Fivetran, the basic sync between Source and Destination is very easy. Data types & constraints get properly copied into the destination.
The new metadata columns and new table are for future periodic sync.