Replication of Data from Oracle database to Cloud SQL for PostgreSQL or AlloyDB

Tarun Chatterjee
Google Cloud - Community
4 min readJan 1, 2024

Many customers use Oracle database for their mission critical systems. With the recent advancements of Open Source databases, PostgreSQL has become a popular choice for application developers, data management teams and data analysts as a viable alternative for Oracle database. There is an increasing trend to also use an Object store and even a PostgreSQL database as an Operational Data Store by replicating data from various Operational systems and application before loading it eventually to a Data Warehouse.

Replicating Data from Oracle to PostgreSQL Database to create an Operational Data Store

As a result of these patterns, we increasingly see data replication from Oracle databases to Object Store and then to a PostgreSQL database.

Google Cloud provides the tools to create a data replication pipeline that can be used to replicate data from Oracle database to PostgreSQL ( Cloud SQL for PostgreSQL or Alloy DB ) running on Google Cloud.

With Hybrid Transactional and Analytical capabilities of AlloyDB and the highly performant Cloud SQL for PostgreSQL Enterprise Plus database service in Google Cloud, creating an ODS for supporting real time data aggregation is very simple.

Below are a few simple steps to replicate data from Oracle database to Cloud SQL for PostgreSQL or AlloyDB

Step 0 : Identify the Schema/Tables in the source Oracle database

In the current scenario we have identified HR schema and will start with data of Countries table to migrate.

Step 1 : Load Data from Oracle Database to GCS using GCP Datastream service

a. Create a GCS Bucket

GCS Bucket Creation

b. Create a new topic with default subscription

Create a Topic

Once the topic has been created, create a subscription for the topic

Create a Subscription for the topic

c. Enable Pub/sub notification

Use the gcloud command to create a Oub/sub notification

gcloud storage buckets notifications create gs://BUCKET_NAME — topic=TOPIC_NAME

In this case → “gcloud alpha storage buckets notifications create gs://dsbucket-test — topic=projects/my-base-project-333409/topics/dsbucket-test”

d. Create the equivalent table in Postgresql (Cloud SQL or AlloyDB)

Create Table in PostgreSQL

Step 2 : Create a Datastream Job to move data from Oracle to GCS Bucket

Create a Datastream job to migrate data from Oracle database tables to GCS. In the current scenario we have chosen only one table as a part of the data movement.

Datastream Pipeline details

The data is stored in a directory structure and in avro format.

Datastream created files by reading the data from Oracle database tables to GCS Bucket

Step 3 : Load Data from GCS Bucket to Cloud SQL PostgreSQL (or AlloyDB)

Enable a few APIs :

Enable Pub/Sub and Compute API. Add the following Permissions to Compute Engine default service account

Dataflow > Worker, Dataflow >Admin, Cloud Storage > Storage Object Admin.

Enable storage.buckets.update and storage.buckets.get permission to ensure the Dataflow template can read the files from the GCS bucket.

Once the APIs have been enabled, Create a Data flow pipeline with Datastream to SQL template

Create a job from the template which will load the data from GCS Bucket to Cloud SQL or AlloyDB table(s) or Schema.

Dataflow job from a predefined template

Once the dataflow job starts to run, data is loaded from the Datastream files in GCS bucket to the PostgreSQL tables.

Using the above steps leveraging GCP Datastream and Dataflow template ,various data pipeline can be created from various Oracle databases to a single PostgreSQL database to create an Operational Datastore.

--

--