Replicate CloudSQL(MySQL/PostgresSQL) data to BigQuery in real-time using Datastream

Vishal Bulbule
Google Cloud - Community
4 min readApr 9, 2023

Introduction

Hello All, Welcome Back !! I got multiple requests on my youtube channel to create a video/blog on replicating data from CloudSQL (MySQL/PostgreSQL) to BigQuery. So here I am with the most simple solution. In this article we will see how we can replicate data from cloudSQL to BigQuery in real-time also we will see how we can capture source database changes and replicate them in Destination BigQuery in real-time using Datastream.

Do not worry, I am not only sharing steps here but also sharing a youtube video where I am performing this setup and replicating data from MySQL to BigQuery in realtime. So Enjoy!

What is DataStream ?

Datastream is a data replication service provided by Google Cloud Platform that allows us to stream real-time changes from your database to BigQuery. Datastream is particularly useful for maintaining an up-to-date replica of your transactional database in BigQuery. You can analyze the data using tools such as BigQuery SQL or other third-party BI tools.

Datastream captures the changes (inserts, updates, and deletes) made to the source database and then sends these changes to BigQuery in real time. This ensures that your data is always up-to-date and that your analytical queries are based on the latest data.

Datastream supports a variety of databases, including MySQL, PostgreSQL, and Oracle, and it can be configured to work with databases running on-premises or in the cloud.

Steps

CloudSQL Setup

1. Create a Cloud SQL instance with MySQL or PostgreSQL.
2. Create a Sample database, and tables and insert sample data.
3. We have created a sample database appdb and sample table country

Create Stream

  1. Navigate to Datastream in the Google Cloud console.
  2. Click on Create Datastream

3. Enter the Required details and select the source and destination type from the dropdown and continue

4. Create a source connection profile by adding hostname & credentials and continue

5. Define the connectivity method from the dropdown. We are using “IP Allowlisting” for this demo.

6. Add all IP addresses in the cloud SQL Authorized Network.

7. Test the connection profile and make sure the test is passed.

8. Configure source objects to replicate from available options and continue.

9. Define Destination details for BigQuery

10. Set a staleness limit, if you set a staleness limit 15min then it might take up to 15min to replicate your data. If data freshness is too critical then set it to a minimum value so data can be replicated within a few seconds.

11. Review details and run validations.

12. Make sure all validations are successful.

13. Start stream.

14. Validate BigQuery if the dataset is created with the required table.

15. For testing make some changes in the source database and validate the changes reflected in BigQuery.

CloudSQL Data

BigQuery Data

Find complete video demo here

Conclusion

In summary, Datastream is a powerful tool that enables real-time data replication from your database to BigQuery, allowing you to perform real-time analytics on your data.

About Me

I’m an 8x Google Cloud certified Architect/Data Engineer with 6+ years of experience, with expertise in Google Cloud, Data Analytics and BI.

My passion for technology and innovation has led me to become a Google Cloud Architect and a Champion Innovator, always seeking new ways to leverage cloud technologies to deliver innovative solutions that make a difference.

You can reach me at Email — vishal.bulbule@techtrapture.com connect with me on LinkedIn — https://www.linkedin.com/in/vishal-bulbule/

Want to know more about me personally Connect with me on Instagram — https://www.instagram.com/vishal_bulbule/?hl=en

YouTube Channel
Check out my YouTube Channel https://www.youtube.com/@techtrapture for tutorials and demos on Google Cloud.

--

--