CloudSQL to BigQuery — Federated Queries

SP Kumar Rachumallu
Jun 18 · 3 min read

Whether you are developing BI dashboards or analyzing data to make critical business decisions, you need to access data from various data marts, data warehouses or transactional databases in order to perform aggregation or derive statistics as necessary for the report.

If you are working with BigQuery as data warehouse and CloudSQL as your RDBMS, you often need to identify a way to access the data for your analytical needs.

One way of doing it is by writing Federated Queries from BigQuery where CloudSQL is treated as an External Source of data and you enclose your query within a function EXTERNAL_QUERY.

Here’s how a sample federated query looks like:

SELECT * FROM EXTERNAL_QUERY 
("test-fedquery-mysql", "SELECT customer_id, MIN(order date) AS first_order_date
FROM orders
GROUP BY customer_id;");
  • I enclosed my query within the function EXTERNAL_QUERY
  • test-fedquery-mysql is the connection_id (or) connection_name we create in BigQuery to access CloudSQL.

Before you create the connection:

  1. Ensure you enable BigQuery Connection API.
  2. Configure Public IP connectivity for your CloudSQL instance. This enables BigQuery to access the instance and query data as required.

Create the connection from BigQuery using Console:

  1. Navigate to BigQuery, click “Add Data” and select “External Data Source”

2. Enter the right information within appropriate fields.

You can copy Cloud SQL Instance ID from the SQL instance page. Look for Connection name.

Once you create the instance, it appears under external connections within the pinned projects.

Voila !! You can now run federated queries on Cloud SQL from BigQuery.

Creating connection using CLI

bq mk \
— connection \
— connection_type=’CLOUD_SQL’ \
— properties=’{“instanceId”:”famous-vista-310604:us-central1:trans-int-sub-us1b”,”database”:”transactiondb”,”type”:”MYSQL”}’ \
— connection_credential=’{“username”:”trans-db-analytical”, “password”:”*******”}’ \
— project_id=famous-vista-310604 \
— location=US \
test-fedquery-mysql

Here’s an example to list all the tables in database transactiondb

SELECT * FROM EXTERNAL_QUERY
(“test-fedquery-mysql”,”select * from information_schema.tables;”);

Remember:

  1. Federated queries can only be executed in regions that support both Cloud SQL and BigQuery.
  2. Access to connection can be restricted by granting permissions to specific users.
  3. BigQuery usually predicts the amount of data processed when a query is written. This won’t be available when querying CloudSQL.
  4. Federated queries are billed the same way native queries are billed (On-demand pricing or Pay as much as you use).
  5. You can schedule federated queries the same way you schedule native queries.
Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

SP Kumar Rachumallu

Written by

Lead Programmer at Novartis Healthcare Pvt Ltd.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com