Import Data in Near Real-Time from Snowflake to Oracle Cloud Database using Airbyte

Shadab Mohammad
Aug 27 · 11 min read

Index

Introduction

Architecture

Solution Deployment

— A. Deploy VM’s for Nginx and Airbyte

Summary

References

Introduction

In this post we will use Airbyte, one of the most exciting Open source ELT tools in modern data engineering to create near real-time replication by fetching data from Snowflake to a 2-Node Oracle RAC Database on OCI. Please note Airbyte is a Extract-Load and Transform (ELT) tool and not a ETL tool like Airflow which can do DAG(Directed Acyclic Graphs) executions. There is a significant difference between ETL vs ELT.

Architecture

The stack has the below important components, it is a mix of Network and IaaS components on Oracle Cloud Infrastructure which will host Nginx and Airbyte. For the Nginx deployment we will use OCI ARM based A1 instances and for Airbyte we will use a AMD E4 Flex instance. Both these instances are available in the always free tier. Sink will be a 2-node Oracle RAC cluster on OCI fetching data from Snowflake as Source.

  1. OCI AMD E4 Flex Instance — Airbyte on Docker
  2. OCI DNS Public Zone which has Domain Management and the A-records are added here
  3. VCN (Virtual Cloud Network) — 2 Subnets , 1 Public subnet hosting the Nginx VM, 1 Private Subnet running the Airbyte Docker container
  4. Oracle RAC Database — 2- Node Oracle RAC DB deployed in the same Private Subnet
  5. Snowflake — A Snowflake 30-day Trial Account which auto-provisions a Demo DB
Snowflake to Oracle

Solution Deployment

A. Deploy the Virtual Machines for Nginx and Airbyte in Public and Private Subnet Respectively

  1. Deploy OCI ARM Instance in Public Subnet and install nginx on it. Ensure port 80 is allowed in security list of Public subnet as stateless rule
sudo setsebool -P httpd_can_network_connect 1
[nginx]
name=nginx repo
baseurl=https://nginx.org/packages/rhel/$releasever/$basearch/
gpgcheck=0
enabled=1
## Install nginxsudo yum install nginx
sudo systemctl start nginx
sudo systemctl status nginx
sudo systemctl enable nginx
## Whitelist HTTP Port 80 on the Instance for External Acessssudo firewall-cmd -zone=public -permanent -add-port=80/tcp
sudo firewall-cmd -zone=public -permanent -add-service=http
sudo firewall-cmd -reload
sudo firewall-cmd -zone=public -permanent -list-ports

B. Configure HTTP Basic Authentication and Install Nginx

1. Configure nginx to act as reverse proxy for Airbyte with basic http authentication

sudo mkdir -p /etc/apache2/
sudo htpasswd -c /etc/apache2/.htpasswd admin
sudo vim /etc/nginx/nginx.conf
user root;events {
worker_connections 4096; ## Default: 1024
}
http {
server {
listen 80;
listen [::]:80;
server_name 10.10.1.138;location / {
proxy_pass http://10.10.1.147:8000;
proxy_set_header X-Forwarded-User $http_authorization;
auth_basic “Administrator’s Area”;
auth_basic_user_file /etc/apache2/.htpasswd;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forward-For $proxy_add_x_forwarded_for;
proxy_pass_header Accept;
proxy_pass_header Server;
proxy_http_version 1.1;
proxy_set_header Authorization $http_authorization;
proxy_pass_header Authorization;
proxy_set_header ns_server-ui yes;
}
}
}
sudo nginx -t
sudo systemctl restart nginx
http://<public-ip>/
sudo tail -30f /var/log/nginx/error.log

C. Enable HTTPS and Acquire Lets Encrypt SSL certificates for your Domain/Sub-Domain

Before we get started add 2 A-records in your DNS domain Management. It could be done on any 3rd party provider like Crazy Domains or Go Daddy, but in my case I use OCI to manage my Domain. For eg : airbyte.yourdomainname.com and www.airbyte.yourdomain.com using the Public IP of your ARM instance

sudo yum install -y yum-utils
sudo yum-config-manager — enable ol7_optional_latest
sudo yum-config-manager — enable ol7_developer_EPEL
cd /tmp
sudo rpm -Uvh /tmp/epel-release-latest-7.noarch.rpm
cd /tmp
sudo wget https://dl.eff.org/certbot-auto
sudo yum install certbot
sudo yum install python-certbot-nginx
sudo vi /etc/nginx/nginx.conf
user root;events {
worker_connections 4096; ## Default: 1024
}
http {
server {
listen 80;
listen [::]:80;
server_name airbyte.yourdomain.com www.airbyte.yourdomain.com;location / {
proxy_pass http://10.10.1.147:8000;
proxy_set_header X-Forwarded-User $http_authorization;
auth_basic “Administrators Area”;
auth_basic_user_file /etc/apache2/.htpasswd;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forward-For $proxy_add_x_forwarded_for;
proxy_pass_header Accept;
proxy_pass_header Server;
proxy_http_version 1.1;
proxy_set_header Authorization $http_authorization;
proxy_pass_header Authorization;
proxy_set_header ns_server-ui yes;
}
}
}
sudo nginx -s reload
sudo certbot — nginx -d airbyte.yourdomain.com -d www.airbyte.yourdomain.com
sudo grep -r -P ‘[^\x00-\x7f]’ /etc/apache2 /etc/letsencrypt /etc/nginx
sudo certbot — nginx -d airbyte.yourdomain.com -d www.airbyte.yourdomain.com
sudo firewall-cmd -zone=public -permanent -add-port=443/tcp
sudo firewall-cmd -zone=public -permanent -add-service=https
sudo firewall-cmd -reload
sudo firewall-cmd -zone=public -permanent -list-ports

D. Create Replication from Snowflake to Oracle using Airbyte

Provision a 30-day free trial account on Snowflake. In my case I used AWS ap-southeast-2 region as the deployment.

  1. Create a Demo Database ‘Demo’ and Schema ‘Movie’ inside Snowflake
Schema
create or replace TABLE DEMO.MOVIE.MOVIE_SALES_FACT (
ORDER_NUM NUMBER(38,0) NOT NULL,
DAY DATE,
DAY_NUM_OF_WEEK NUMBER(38,0),
DAY_NAME VARCHAR(26),
MONTH VARCHAR(12),
MONTH_NUM_OF_YEAR NUMBER(38,0),
MONTH_NAME VARCHAR(26),
QUARTER_NAME VARCHAR(26),
QUARTER_NUM_OF_YEAR NUMBER(38,0),
YEAR NUMBER(38,0),
CUSTOMER_ID NUMBER(38,0),
USERNAME VARCHAR(26),
CUSTOMER_NAME VARCHAR(250),
STREET_ADDRESS VARCHAR(250),
POSTAL_CODE VARCHAR(26),
CITY_ID NUMBER(38,0),
CITY VARCHAR(128),
STATE_PROVINCE_ID NUMBER(38,0),
STATE_PROVINCE VARCHAR(128),
COUNTRY_ID NUMBER(38,0),
COUNTRY VARCHAR(126),
COUNTRY_CODE VARCHAR(26),
CONTINENT VARCHAR(128),
SEGMENT_NAME VARCHAR(26),
SEGMENT_DESCRIPTION VARCHAR(128),
CREDIT_BALANCE NUMBER(38,0),
EDUCATION VARCHAR(128),
EMAIL VARCHAR(128),
FULL_TIME VARCHAR(26),
GENDER VARCHAR(26),
HOUSEHOLD_SIZE NUMBER(38,0),
HOUSEHOLD_SIZE_BAND VARCHAR(26),
WORK_EXPERIENCE NUMBER(38,0),
WORK_EXPERIENCE_BAND VARCHAR(26),
INSUFF_FUNDS_INCIDENTS NUMBER(38,0),
JOB_TYPE VARCHAR(26),
LATE_MORT_RENT_PMTS NUMBER(38,0),
MARITAL_STATUS VARCHAR(26),
MORTGAGE_AMT NUMBER(38,0),
NUM_CARS NUMBER(38,0),
NUM_MORTGAGES NUMBER(38,0),
PET VARCHAR(26),
PROMOTION_RESPONSE NUMBER(38,0),
RENT_OWN VARCHAR(26),
YEARS_CURRENT_EMPLOYER NUMBER(38,0),
YEARS_CURRENT_EMPLOYER_BAND VARCHAR(26),
YEARS_CUSTOMER NUMBER(38,0),
YEARS_CUSTOMER_BAND VARCHAR(26),
YEARS_RESIDENCE NUMBER(38,0),
YEARS_RESIDENCE_BAND VARCHAR(26),
AGE NUMBER(38,0),
AGE_BAND VARCHAR(26),
COMMUTE_DISTANCE NUMBER(38,0),
COMMUTE_DISTANCE_BAND VARCHAR(26),
INCOME NUMBER(38,0),
INCOME_BAND VARCHAR(26),
MOVIE_ID NUMBER(38,0),
SEARCH_GENRE VARCHAR(26),
TITLE VARCHAR(4000),
GENRE VARCHAR(26),
SKU NUMBER(38,0),
LIST_PRICE NUMBER(38,2),
APP VARCHAR(26),
DEVICE VARCHAR(26),
OS VARCHAR(26),
PAYMENT_METHOD VARCHAR(26),
DISCOUNT_TYPE VARCHAR(26),
DISCOUNT_PERCENT NUMBER(38,1),
ACTUAL_PRICE NUMBER(38,2),
QUANTITY_SOLD NUMBER(38,0)
);
create or replace stage awssydney
url = 's3://dwhload/'
credentials = (aws_secret_key = 'K********m' aws_key_id = 'A*********S');
SELECT metadata$filename FROM @awssydney/;
COPY INTO DEMO.MOVIE.MOVIE_SALES_FACT FROM @awssydney/movie_sales_fact_m-201801_1.csv
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY = '"' SKIP_HEADER = 1)
ON_ERROR=continue
;
select count(*) from DEMO.USER.MOVIE_SALES_FACT
sqlplus “/as sysdba”SQL> show pdbsSQL> alter session set container=PDB1;SQL> select TABLE_NAME,OWNER from DBA_TABLES where OWNER=’MOVIE’;TABLE_NAME
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
OWNER
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
AIRBYTE_TMP_MOVIE_SALES_FACT_26436382F498415DB280E318710585EF
MOVIE
AIRBYTE_RAW_MOVIE_SALES_FACT
MOVIE
SQL> select count(*) from MOVIE.AIRBYTE_RAW_MOVIE_SALES_FACT;COUNT(*)
— — — — —
24999
SQL> desc movie.AIRBYTE_RAW_MOVIE_SALES_FACT;
Name Null? Type
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
AIRBYTE_AB_ID NOT NULL VARCHAR2(64)
AIRBYTE_DATA NCLOB
AIRBYTE_EMITTED_AT TIMESTAMP(6) WITH TIME ZONE
set termout off
set verify off
set trimspool on
set linesize 200
set longchunksize 20000000
set long 20000000
set pages 0
column txt format a120
select
json_object
(
'body' value AIRBYTE_DATA
FORMAT JSON)
from
movie.AIRBYTE_RAW_MOVIE_SALES_FACT
Where rownum<=10;
-- Aggregated JSON --select json_arrayagg(AIRBYTE_DATA format json) from movie.AIRBYTE_RAW_MOVIE_SALES_FACT where rownum<=2;-- Pretty JSON from 19c Onwards --
select
json_serialize (AIRBYTE_DATA returning varchar2 pretty)
from movie.AIRBYTE_RAW_MOVIE_SALES_FACT where rownum<=2;
COPY INTO DEMO.MOVIE.MOVIE_SALES_FACT FROM @awssydney/movie_sales_fact_m-201801_2.csv
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY = '"' SKIP_HEADER = 1)
ON_ERROR=continue
;
SQL> select count(*) from MOVIE.AIRBYTE_RAW_MOVIE_SALES_FACT;COUNT(*)
— — — — —
50000

Summary

We have seen how easy it is to create a production grade Airbyte deployment on Oracle Cloud Infrastructure and replicate data in near real-time from a Snowflake Cluster to Oracle RAC Cluster on OCI.

References:

[1] Oracle Cloud Compute E4 platform — https://blogs.oracle.com/cloud-infrastructure/post/announcing-oracle-cloud-compute-e4-platform-on-third-gen-amd-epyc-processors

Oracle Developers

A community for developers by developers.

Oracle Developers

Aggregation of articles from Oracle engineers, Groundbreaker Ambassadors, Oracle ACEs, and Java Champions on all things Oracle technology. The views expressed are those of the authors and not necessarily of Oracle.

Shadab Mohammad

Written by

Principal Cloud Solutions Architect at Oracle

Oracle Developers

Aggregation of articles from Oracle engineers, Groundbreaker Ambassadors, Oracle ACEs, and Java Champions on all things Oracle technology. The views expressed are those of the authors and not necessarily of Oracle.