BigQuery Federated Queries with Oracle

Javier Garcia Puga
Google Cloud - Community
7 min readJun 14, 2024

BigQuery is a powerful tool for analyzing large datasets, but what if your data is stored in external sources like Oracle databases? While data migration or replication tools are often the go-to solutions, BigQuery’s federated queries offer an alternative when these options aren’t possible.

While BigQuery doesn’t natively support direct federation with Oracle databases, there’s a workaround we can leverage. By combining the power of Cloud SQL PostgreSQL and the oracle_fdw extension, we can unlock seamless access to your Oracle data directly within BigQuery.

In the following sections of this article, we’ll walk you through this solution step by step, demystifying the process and empowering you to tap into the full potential of your Oracle data within the BigQuery ecosystem.

Understanding BigQuery Federation

BigQuery federated queries is a mechanism that allows you to query data residing in external databases without the need to physically move the data into BigQuery. This is achieved by sending query statements to the external database and retrieving the results as a temporary table within BigQuery.

Advantages of BigQuery federated queries

  • No Data Movement: Eliminates the need to transfer data, saving time and resources.
  • Real-Time Access: Enables querying data from Cloud SQL (PostgreSQL and MySQL 2nd generation) and other data sources in real time.
  • Diverse Data Source Integration: Combines data from various sources, including Cloud SQL tables, BigQuery tables, and external files.

Drawbacks and Limitations

  • Performance Considerations: Federated queries might be slower compared to querying data solely within BigQuery. The performance depends on the external database’s optimization for analytical queries.
  • Read-Only Operations: Federated queries support only read operations; DML or DDL statements are not allowed.
  • Unsupported Data Types: Queries fail if the external data source contains unsupported data types in BigQuery. Casting to supported types might be necessary.

Proposed Architecture: BigQuery -> Cloud SQL with oracle_fdw

A typical architecture involves using Cloud SQL as an intermediary between BigQuery and the Oracle database. To facilitate interaction with Oracle databases, PostgreSQL Foreign Data Wrappers (FDWs) can be employed. For Oracle databases, the oracle_fdw extension is available within Cloud SQL Postgres and AlloyDB

PostgreSQL Foreign Data Wrappers

The following image illustrates a solution for querying an on-premises Oracle database from Google Cloud’s BigQuery using Cloud SQL (PostgreSQL) as a bridge.

Key Components:

  1. Oracle Database (On-Premises): This is your existing Oracle database, residing outside of Google Cloud. The goal is to access data within this database from BigQuery.
  2. Cloud SQL (PostgreSQL): A managed PostgreSQL instance running on Google Cloud Platform (GCP). This acts as an intermediary between the Oracle database and BigQuery.
  3. Foreign Data Wrapper (oracle_fdw): An extension installed on the Cloud SQL instance that enables it to communicate with and understand Oracle databases.
  4. Server Configuration: Within PostgreSQL, a server connection to the Oracle database is configured. This involves specifying the Oracle server address, port, and service name.
  5. User Mapping: A mapping between the Cloud SQL (PostgreSQL) user and the Oracle database user is established. This ensures that the Cloud SQL instance can access Oracle data with the correct permissions.
  6. Foreign Tables: In PostgreSQL, foreign tables are created to represent specific tables or views within the Oracle database. This allows you to query Oracle data as if it were native to PostgreSQL.
  7. BigQuery Cloud SQL Connection: A connection is established between BigQuery and the Cloud SQL instance. This connection is used to send SQL queries from BigQuery to the Cloud SQL instance.
  8. View: A view is created in BigQuery that references the foreign tables in Cloud SQL. This enables users to query the Oracle data through BigQuery as if it were stored directly in BigQuery.

From the point of view of IAM, it is required two different set of permissions:

  • The user running the query will need the roles/bigquery.connectionUser on the CloudSQL connection and roles/bigquery.dataViewer on the required BigQuery native tables.
  • The service account that is automatically created when a connection to Cloud SQL is created has the Cloud SQL Client roles/cloudsql.client role on the Cloud SQL instance.

End 2 end example

Let’s suppose we have this table in our on-premises Oracle database with some example records.

CREATE TABLE oratab (
ID NUMBER(5) PRIMARY KEY,
TEXT VARCHAR2(30),
FLOATING NUMBER(7,2) NOT NULL
);

First, create a CloudSQL Postgres instance. It is recommended to create this instance with a private IP address.

Once it is created, follow these instructions to connect the instance from your preferred environment using the CloudSQL Proxy:

https://cloud.google.com/sql/docs/postgres/connect-auth-proxy

To set up the connection, just launch this command from one terminal:

./cloud-sql-proxy - private-ip $PROJECT:europe-west4:$INSTANCE_NAME

Then from other terminal, you can connect to Postgres with an admin user to setup the oracle_fdw extension and create the foreign data wrapper for Oracle. In https://github.com/laurenz/oracle_fdw you have the detailed configuration steps:

psql "host=127.0.0.1 port=5432 sslmode=disable dbname=postgres user=adminuser"
CREATE EXTENSION oracle_fdw;
CREATE SERVER oradb FOREIGN DATA WRAPPER oracle_fdw
OPTIONS (dbserver '//HOST:1521/SERVICE);

It is recommended that the rest of the process is performed by a normal user. Permissions can be granted as follow.

GRANT USAGE ON FOREIGN SERVER oradb TO pguser;

Reconnect to the instance as the normal user “pguser”.

psql "host=127.0.0.1 port=5432 sslmode=disable dbname=postgres user=pguser"

The next step is setting up authentication to the Oracle instance. Note that the recommended approach is to use an external password store.

CREATE USER MAPPING FOR postgres SERVER oradb
OPTIONS (user 'ORA_USER', password 'ORA_PWD');

Then we have everything set to create the Oracle foreign table and query it:

CREATE FOREIGN TABLE oratab (
id integer OPTIONS (key 'true') NOT NULL,
text character varying(30),
floating double precision NOT NULL
) SERVER oradb OPTIONS (schema 'HDFSUSER', table 'ORATAB');

You can also import the complete schema from Oracle instead of going table by table by using this command:

CREATE SCHEMA ora_schema;
IMPORT FOREIGN SCHEMA "ORACLE_SCHEMA" FROM SERVER oradb INTO ora_schema;

Setting up the BigQuery connection to CloudSQL

Follow this guide to setup the BigQuery connection to the CloudSQL instance: https://cloud.google.com/bigquery/docs/connect-to-sql#create-sql-connection

Please note that you will need to provide the roles/cloudsql.client role to the bigquery connection service account.

Then, you can grant the following roles to let users query data and manage connections:

  • roles/bigquery.connectionUser: enables users to use connections to connect with external data sources and run queries on them.
  • roles/bigquery.connectionAdmin: enables users to manage connections.

Then you can use the EXTERNAL_QUERY syntax to executes a query on the external database.

SELECT * FROM EXTERNAL_QUERY("PROJECT.europe-west4.postgres", "SELECT * FROM oratab;")

But that’s not all. BigQuery allows us to JOIN the data of the federated source with a native BigQuery table.

Let’s suppose we have a native table “names” with the following rows:

You can can then join both tables using the typical SQL syntax:

SELECT
t2.name,
t1.text,
t1.floating
FROM
EXTERNAL_QUERY("PROJECT.europe-west4.postgres", "SELECT * FROM oratab;") AS t1
INNER JOIN `oracle_export.names` AS t2 ON t1.id = t2.id

PRO tip: You can create views to make easier working with federated sources:

CREATE VIEW oracle_export.oraview as
SELECT * FROM EXTERNAL_QUERY("PROJECT.europe-west4.postgres", "SELECT * FROM oratab;");
select * from oracle_export.oraview ;

Conclusion

BigQuery federated queries offer a valuable solution for integrating data from external sources like Oracle databases. While performance and limitations should be considered, the ability to query data in real time without data movement makes it a compelling choice for many analytical use cases.

This setup allows you to leverage BigQuery for analyzing data stored in your on-premises Oracle database. By using Cloud SQL with the oracle_fdw extension, you create a seamless bridge that enables you to access and query Oracle data from within Google Cloud’s ecosystem.

PD: Additional considerations and Limitations

https://cloud.google.com/bigquery/docs/federated-queries-intro#limitations

  • Federated queries are read-only. Therefore, DML or DDL statements are not supported.
  • You must create the connection resource in the same project as the Cloud SQL or AlloyDB instance.
  • Connection Management: Each data product (a set of tables accessible to the user) requires a separate connection.
  • Third-Party Extension: The oracle_fdw extension is open source and not directly supported by Google.
  • User Access: Users need access to both the SQL connection and the BigQuery view.
  • Materialized Views: Cannot directly query external tables, although they are available in Postgres.
  • Data Type Conversions: Careful handling of data type conversions between Oracle, Postgres, and BigQuery is essential.
  • Cross-Region Queries: Additional considerations apply when BigQuery and the external data source are in different regions.

--

--