Stream your data: On-Prem MS-SQL to CloudSQL SQL Server to BigQuery (Part-1)

Ravish Garg
Google Cloud - Community
4 min readJun 8, 2021

--

Part1: Online replication from On-Prem MS-SQL to CloudSQL SQL Server

Why the need?

Legacy versions of enterprise relational databases have been an integral part of our application stack but as the organizations are transitioning into the digital era, they start realizing the value of untapped data and the plethora of opportunities they are missing. Thus, most organizations now looking for efficient ways to integrate their data-sources with next-gen cloud-native data warehouse platforms leveraging batch or streaming pipelines.

As per Harvard Business Review, Cross-industry studies show that on average, less than 50% of an organisation’s structured data is actively used in making decision.

In this multi-part blog, I’ll walk you through an end-to-end scenario of building a data-pipeline from a legacy SQL Server to BigQuery.

Scenario: A cost-sensitive customer, who has SQL Server 2014 standard edition and looking to build a CDC pipeline to their warehouse platform i.e. BigQuery.

By SELİM ARDA ERYILMAZ on Unsplash

Challenge: SQL Server 2014 Standard Edition does not support CDC by default and thus, either customer needs to upgrade to the 2016 standard edition or SQL Server 2014 Enterprise Edition. Later is not feasible since the customer is very cost-sensitive and the former option requires 1 to 2 months to fully test its application and that can lead to loss of potential opportunities due to lack of timely analytics.

Mitigation: Google CloudSQL for SQL Server supports transactional replication and thus, customers can sync all required tables/databases from legacy SQL Server databases to Google CloudSQL in real-time.

Implementation Plan: Microsoft SQL Server provides multiple replication options between SQL Server instances for use in distributed applications. For this scenario, we will implement Transactional Replication which makes the source database as publisher and all configured objects & data are applied to the Subscriber (Target) database in near real-time. The data changes are applied to the subscriber in the same order and within the same transaction boundaries as they occurred on the publisher.

Transactional Replication to CloudSQL SQL Server, By Author

To demonstrate, I have created a database named “demo” in a SQL Server 2014 Std. Edition and provisioned a Standard Edition Google CloudSQL SQL Server 2017 with database named “csqlawsub”.

Now, let’s setup a push based transactional replication by configuring distributor and publisher at source SQL Server.

Distributor Configuration at source SQL Server, By Author
Publisher Configuration, By Author

Created a table named “Leaderboard” in the “demo” database of source SQL Server and configure the publisher “demo_pub” to push all changed data of this table to subscriber @ CloudSQL instance.

Leaderboard Table DDL, By Author

Initiate the replication and test the data replication from source SQL Server to CloudSQL in near real-time state.

Replication Monitor: Publisher to Subscriber, By Author

Inserted 950 rows in “leaderboard” table in the “demo” database which got replicated to CloudSQL “csqlawsub” in near real-time.

“leaderboard” table @ source Database “Demo”, By Author
Rows replicated in “leaderboard” table @ “csqlawsub” database in CloudSQL SQL Server 2017, By Author

Without getting into SQL Server transactional replication explanation (for which you can refer to LINK). Here, I gave you a quick peek at how to set up an online replication to migrate user data from your existing SQL Server instance to Google CloudSQL in just few clicks.

Since Google CloudSQL SQL Server provides you 2017 version by default, you will get the CDC capability from standard edition itself which is a fully managed database service from Google. Now in Part-2, we will leverage on same & go through possible ways to construct our pipeline to BigQuery based on business requirements.

References:

--

--

Ravish Garg
Google Cloud - Community

Customer Engineer, Data Specialist @ Google Cloud. I assist customers transform & evolve their business via Google’s global network and software infrastructure.