Replication of Data from Oracle database to Cloud SQL for PostgreSQL or AlloyDB
Many customers use Oracle database for their mission critical systems. With the recent advancements of Open Source databases, PostgreSQL has become a popular choice for application developers, data management teams and data analysts as a viable alternative for Oracle database. There is an increasing trend to also use an Object store and even a PostgreSQL database as an Operational Data Store by replicating data from various Operational systems and application before loading it eventually to a Data Warehouse.
As a result of these patterns, we increasingly see data replication from Oracle databases to Object Store and then to a PostgreSQL database.
Google Cloud provides the tools to create a data replication pipeline that can be used to replicate data from Oracle database to PostgreSQL ( Cloud SQL for PostgreSQL or Alloy DB ) running on Google Cloud.
With Hybrid Transactional and Analytical capabilities of AlloyDB and the highly performant Cloud SQL for PostgreSQL Enterprise Plus database service in Google Cloud, creating an ODS for supporting real time data aggregation is very simple.
Below are a few simple steps to replicate data from Oracle database to Cloud SQL for PostgreSQL or AlloyDB
Step 0 : Identify the Schema/Tables in the source Oracle database
In the current scenario we have identified HR schema and will start with data of Countries table to migrate.
Step 1 : Load Data from Oracle Database to GCS using GCP Datastream service
a. Create a GCS Bucket
b. Create a new topic with default subscription
Once the topic has been created, create a subscription for the topic
c. Enable Pub/sub notification
Use the gcloud command to create a Oub/sub notification
gcloud storage buckets notifications create gs://BUCKET_NAME — topic=TOPIC_NAME
In this case → “gcloud alpha storage buckets notifications create gs://dsbucket-test — topic=projects/my-base-project-333409/topics/dsbucket-test”
d. Create the equivalent table in Postgresql (Cloud SQL or AlloyDB)
Step 2 : Create a Datastream Job to move data from Oracle to GCS Bucket
Create a Datastream job to migrate data from Oracle database tables to GCS. In the current scenario we have chosen only one table as a part of the data movement.
The data is stored in a directory structure and in avro format.
Step 3 : Load Data from GCS Bucket to Cloud SQL PostgreSQL (or AlloyDB)
Enable a few APIs :
Enable Pub/Sub and Compute API. Add the following Permissions to Compute Engine default service account
Dataflow > Worker, Dataflow >Admin, Cloud Storage > Storage Object Admin.
Enable storage.buckets.update and storage.buckets.get permission to ensure the Dataflow template can read the files from the GCS bucket.
Once the APIs have been enabled, Create a Data flow pipeline with Datastream to SQL template
Create a job from the template which will load the data from GCS Bucket to Cloud SQL or AlloyDB table(s) or Schema.
Once the dataflow job starts to run, data is loaded from the Datastream files in GCS bucket to the PostgreSQL tables.
Using the above steps leveraging GCP Datastream and Dataflow template ,various data pipeline can be created from various Oracle databases to a single PostgreSQL database to create an Operational Datastore.