Solution : Not able to connect to a Private Cloud SQL Instance from a Private Data Fusion Instance?

Megha Bedi
Google Cloud - Community
5 min readDec 10, 2022

Why Cloud Data Fusion with Private IP “does not” connect with Cloud SQL instance with Private IP despite using the same VPC network?

I can access the Public Data Fusion instance from Cloud Data Fusion without additional set-up, but why is it not working for the Private IP?

Please read this article to discover the reason behind this challenge and get its solution.

Public Cloud SQL instances can be accessed from Cloud Data Fusion, but for security reasons, it is recommended that the private instances be used for Cloud SQL and Cloud Data Fusion.

Challenge

When you create a private Data Fusion instance and a private Cloud SQL instance in the same VPC and try to access the Cloud SQL instance (MySQL or SQLServer) from the Cloud Data Fusion instance, you are not able to establish a connection between them and get below error:

Error Message

Background

When you provision a Private Data Fusion instance, you will see a Private Service Connection created under VPC networks → Your VPC Network → Private Service Connection as below:

Whenever you create a Cloud Data Fusion (CDF) instance, behind the scenes, it creates a Dataproc Cluster and it runs all the sqoop jobs. Now, the Dataproc instance will only have a private IP which is the whole idea behind the concept of creating a CDF instance with private IP address. When you follow this to provision a private CDF instance, it guides you to create a VPC Peering between your current VPC network ( say default network) and the Cloud Fusion tenant project network. So, whenever a CDF instance runs, it runs as a tenant project.

The tenant project ID is the portion between the “at” symbol (@) and the following period (.). For example, if the service account value is
cloud-datafusion-management-sa@r8170c9b5e7699803-tp.iam.gserviceaccount.com, the tenant project ID is r8170c9b5e7699803-tp.

VPC Peering with Tenant project

When you provision a Cloud SQL instance with only Private IP address, behind the scenes it creates a VPC peering which happens implicitly between your current VPC and the Cloud SQL specific peered VPC.

Reason for Connection Failure

Now we know what all happens when both the private instances are provisioned. Let’s see the reason why the connection is not established despite the setup.
Whenever the Private Data Fusion instance is created the user creates a peering between the current network and the tenant project network. Whenever Private Cloud SQL instance is created again, the peering happens between the current network and the Cloud SQL specific network.
Now, these 2 different peerings are happening between the current network, which is your default network. But the Cloud Data Fusion network and Cloud SQL network are not peered with each other, which is why you cannot create a connection between the two private instances.

VPC Peering is not transitive

Although it looks like both instances are in the same network, if you drill down into the details, you’ll find that these are two different networks that do not peer; hence their internal IPs are not reachable.
Now let’s talk about the solution.

Solution

To connect to a Private Cloud SQL instance from a Private Cloud Data Fusion instance, use a proxy Compute Engine VM. A proxy is required because the Cloud SQL network is not directly peered with the Cloud Data Fusion network, and transitive peers cannot communicate with each other. To know more about VPC Peering, check here.

Proxy VM

Steps of the solution:

1) Create a Private IP Cloud SQL instance using this documentation. This will include two steps:

  • Setting up VPC
  • Allocating IP ranges

2) Create a Private Cloud Data Fusion Instance using this documentation.

Note: Cloud SQL creates the VPC peering necessary to access the Private IP instance automatically, whereas Data Fusion does not. You have to create the VPC Peering yourself explicitly. Make sure to complete this step.

3) Once you complete the above steps, you should see 3 VPC peerings created in your console.

  • CloudSQL
  • Service Networking
  • Cloud Data Fusion
VPC Peering

4) Create a Cloud SQL proxy using this documentation.

5) Create or ensure a firewall rule allows traffic from the Private CDF instance to your proxy VM on the required port. The IP address range for the CDF can be found on the CDF Instance Details page.

Firewall Rule

Accessing the Cloud SQL instance from Cloud Data Fusion instance

6) Create a database connection in Cloud Data Fusion

  • Go to the CDF Console (external URL)
  • From the Home screen, select Wrangler
  • From the Connections screen, select Add A Connection
  • Then select Database
Database connection in Cloud Data Fusion

7) Before you begin accessing the Cloud SQL instance from the Cloud Data Fusion instance, make sure you have installed the mysql/sqlserver driver from Hub.

Hub

8) Complete the database connection in Cloud Data Fusion. You will successfully be able to connect to the Private Cloud SQL instance from the Private Cloud Data Fusion instance.

Successful Connection

Hope you found this article helpful. You can reach out to me on LinkedIn.

--

--

Megha Bedi
Google Cloud - Community

Engineer@Google ; Thrive with knowledge, skills and mindset