Connect AlloyDB to Oracle using Oracle FDW

Published in

Google Cloud - Community

9 min readFeb 26, 2024

Wouldn’t it be nice to have all your data in a single database running reports across all kinds of databases — like “one ring to rule them all”? AlloyDB is one of the best choices to make it a central piece of the reporting architecture. It is designed for hybrid workload combining intensity of transaction processing with capabilities to run complex analytical queries thanks to the combinations of PostgreSQL and unique to AlloyDB features such as Columnar Engine. Sometimes the data cannot be moved to AlloyDB right away and has to stay in the original database. But it shouldn’t stop you from using AlloyDB for consolidated analysis across different databases. PostgreSQL architecture embraces extensions paradigm as one of the main ways to expand engine functionality. The group of Foreign Data Wrappers (FDW) extensions allows querying data in remote databases even when the remote database is not PostgreSQL compatible. The Oracle FDW is one such extension that helps query data from a remote Oracle database. Let’s discuss what it is and how to start working with it.

Benefits of using Oracle FDW

A few words about what the Oracle FDW is. Oracle FDW allows you to query remote Oracle data from AlloyDB using the native PostgreSQL syntax. At the same time it can offload some parts of the data processing to the source Oracle database.
Here is a simple example showing the execution plan for a query using data from Oracle database. We can see that the filtering of the data is happening on the Oracle side which helps to reduce network load and speed up the query. Here is an example:

quickstart_db=> explain analyze verbose select * from ora_xe_hr.departments where location_id=1700;
                                                                                     QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan on ora_xe_hr.departments  (cost=10000.00..20000.00 rows=1000 width=86) (actual time=3.267..3.314 rows=21 loops=1)
   Output: department_id, department_name, manager_id, location_id
   Oracle query: SELECT /*49616df4943bc603*/ r1."DEPARTMENT_ID", r1."DEPARTMENT_NAME", r1."MANAGER_ID", r1."LOCATION_ID" FROM "HR"."DEPARTMENTS" r1 WHERE (r1."LOCATION_ID" = 1700)
   Oracle plan: SELECT STATEMENT
   Oracle plan:   TABLE ACCESS BY INDEX ROWID BATCHED DEPARTMENTS
   Oracle plan:     INDEX RANGE SCAN DEPT_LOCATION_IX (condition "R1"."LOCATION_ID"=1700)
 Query Identifier: 412604040332804521
 Planning Time: 3.134 ms
 Execution Time: 3.386 ms
(9 rows)

In the example you can see that the query to the foreign table departments is offloaded to Oracle where it is using index access to filter the data before returning it back to the AlloyDB instead of returning the entire dataset to the AlloyDB for filtering.

Here is another example where join of two tables and filtering are offloaded to the Oracle side.

quickstart_db=> explain analyze verbose select count(*) from ora_xe_hr.employees e join ora_xe_hr.departments d on e.department_id=d.department_id  where d.department_name='Executive';
                                                                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10250.06..10250.07 rows=1 width=8) (actual time=4.471..4.472 rows=1 loops=1)
   Output: count(*)
   ->  Foreign Scan  (cost=10000.00..10250.00 rows=25 width=0) (actual time=4.448..4.463 rows=3 loops=1)
         Oracle query: SELECT /*ab1a140f94e4f41c*/ '1' FROM ("HR"."EMPLOYEES" r1 INNER JOIN "HR"."DEPARTMENTS" r2 ON (r1."DEPARTMENT_ID" = r2."DEPARTMENT_ID") AND (r2."DEPARTMENT_NAME" = 'Executive'))
         Oracle plan: SELECT STATEMENT
         Oracle plan:   NESTED LOOPS
         Oracle plan:     TABLE ACCESS FULL DEPARTMENTS  (filter "R2"."DEPARTMENT_NAME"='Executive')
         Oracle plan:     INDEX RANGE SCAN EMP_DEPARTMENT_IX (condition "R1"."DEPARTMENT_ID"="R2"."DEPARTMENT_ID")
 Query Identifier: 6760621540191738674
 Planning Time: 3.792 ms
 Execution Time: 4.545 ms
(11 rows)

In the outline of the execution plan you can see the join and filtering operations have been delegated to the remote site delivering only the resulting dataset for the final stage of aggregation.

Network Challenges

AlloyDB is in the cloud (unless you are using AlloyDB Omni) and needs to connect to the remote database. As of now even though it can have a public IP (in preview) it still cannot use that interface for outbound connections. As a result we have to use the private instance IP and it can introduce some challenges. Let me dive a bit deeper here and talk about some possible cases.

The outbound connection from AlloyDB works just fine when the remote database is placed in the same VPC (Virtual Private Cloud) . In that case the private IP for AlloyDB is fully discoverable by the Oracle instance and they can easily communicate with each other.

On the diagram you can see the Oracle instance deployed on GCE using the same VPC as the AlloyDB private IP range.

But what would happen if the remote database is deployed in another VPC or in a local datacenter and, for example, we try to use a VPC peering to another subnet?

You might notice on the diagram that AlloyDB is not really in the same VPC as your other apps or a client VM. It is because AlloyDB as a managed service is deployed and managed in a Google internal project with its own VPC and connected to your VPC using network peering. That doesn’t prevent connections to any resources deployed in the client’s VPC. But if you try to deploy something in another VPC and try to use VPC peering then the connection from AlloyDB to that third VPC or back will not work.

The reason is that VPC does not support transit connections. AlloyDB with its own internal peering to the client’s VPC cannot establish connections to the remote VPC connected by the peering. The VPC in the middle doesn’t support transit connections from one peering to another.

What are our options then? We still have at least a couple of choices. For example, we can deploy a gateway VM with interfaces in both networks.

The VM is created with two network interfaces and IP forwarding enabled. One interface is using the default VPC and the second interface is in the remote VPC where Oracle database is deployed. Then we create custom routes in both networks and advertise them accordingly to let the TCP packets know how to travel from one side to another. And the last but not the least important step is to create firewall rules to allow the connections from AlloyDB to Oracle.

We can even make it more sophisticated by replacing a single VM by GCE instance group and adding internal load balancers on both sides. It will make it highly available. But would I recommend such an approach? Not really. I think it involves too many parts to manage.

There are some other ways to establish connections but I would recommend in such a case to use VPN for Google VPC.

On the diagram you can see the default VPC is using a highly available Google VPN to connect to the second VPC. The VPN is using BGP (Border Gateway Protocol) to exchange routing information on the both ends of the VPN. It is enough for all existing subnets in the VPC but we also need to export custom routes with our private service access IP ranges used by AlloyDB. It allows correct advertising of the IPs of the AlloyDB to the second VPC.

The same routers and BGP configuration can be used for VPN or direct connection to another cloud or on premises network where oracle databases are deployed. You can try all the steps using the public codelab.

Accessing Oracle Data

When we have solved the network connection we can now configure our FDW and create the representation of Oracle data inside our AlloyDB schema.

We need to perform several steps to configure the Oracle FDW in our AlloyDB before we can use remote data. Here are the steps:

Enable the Oracle FDW extension

CREATE extension IF NOT EXISTS oracle_fdw CASCADE;

Configure the foreign Oracle server (the IP there is for the Oracle server machine)

CREATE server ora_xe FOREIGN DATA wrapper oracle_fdw OPTIONS (dbserver '10.132.0.15:1521/xepdb1');

Grant usage on the created server to the PostgreSQL user (user “appuser” here)

GRANT USAGE ON FOREIGN server ora_xe TO appuser;

Create mapping between our PostgreSQL user and an Oracle user (password is for the Oracle user)

CREATE USER MAPPING FOR appuser SERVER ora_xe OPTIONS ( USER 'HR', PASSWORD 'VeryStrongPassword');

There are two main approaches to how to start working with the remote tables. The first, more granular approach is to create a view for each object in the remote database you want to access using a SQL statement like the following.

CREATE FOREIGN TABLE employees (
 employee_id int OPTIONS(KEY 'true'),
 first_name varchar(20),
 last_name varchar(25),
 email varchar(25),
 phone_number varchar(20),
 hire_date timestamp,
 job_id varchar(10),
 salary NUMERIC,
 commission_pct NUMERIC,
 manager_id NUMERIC,
 department_id NUMERIC
) SERVER ora_xe OPTIONS (SCHEMA 'HR',
TABLE 'EMPLOYEES');

Another approach is to create a dedicated schema in the AlloyDB database and import all tables definition from the source Oracle schema using a couple of statements.

CREATE SCHEMA ora_xe_hr;
import FOREIGN SCHEMA "HR" FROM server ora_xe INTO ora_xe_hr;

All definitions for the tables from the remote HR schema will be automatically imported to the “ora_xe_hr” schema and you will be able to list the objects and query the data.

quickstart_db=> \det+ ora_xe_hr.*
                                     List of foreign tables
  Schema   |      Table       | Server |                FDW options                | Description
-----------+------------------+--------+-------------------------------------------+-------------
 ora_xe_hr | countries        | ora_xe | (schema 'HR', "table" 'COUNTRIES')        |
 ora_xe_hr | departments      | ora_xe | (schema 'HR', "table" 'DEPARTMENTS')      |
 ora_xe_hr | emp_details_view | ora_xe | (schema 'HR', "table" 'EMP_DETAILS_VIEW') |
 ora_xe_hr | employees        | ora_xe | (schema 'HR', "table" 'EMPLOYEES')        |
 ora_xe_hr | job_history      | ora_xe | (schema 'HR', "table" 'JOB_HISTORY')      |
 ora_xe_hr | jobs             | ora_xe | (schema 'HR', "table" 'JOBS')             |
 ora_xe_hr | locations        | ora_xe | (schema 'HR', "table" 'LOCATIONS')        |
 ora_xe_hr | regions          | ora_xe | (schema 'HR', "table" 'REGIONS')          |
(8 rows)

quickstart_db=> select count(*) from ora_xe_hr.countries;
 count
-------
    25
(1 row)

quickstart_db=>

Having the tables created or imported you can now query the data as they are in your AlloyDB database and, as we’ve mentioned above, big part of the workload might be automatically offloaded to the remote side.

Security

We need to be very cautious about security and apply proper protection layers to every part of the communication channel. The network connection to the Oracle database would be protected by VPN in highly available configuration or using highly available direct connect. I don’t recommend connections using the open internet.

I recommend using firewall rules on both sides to make sure the access to Oracle is available only from authorized IP addresses or network ranges.

Access to Oracle FDW configuration information should be monitored and restricted. The user mapping contains the password for the Oracle user and I don’t think you want that password to be visible for everyone.

The Oracle user used for connection should have only required access to the tables on the Oracle side and I would probably restrict it to only view access. Better be on the safe side. It could be bad enough if somebody gets access and can read your data but it would be even worse if it can change or delete information there.

Summary

To conclude let me recap we’ve covered about Oracle FDW in AlloyDB

You can query data in a remote Oracle database without migrating data.
It uses familiar PostgreSQL syntax and can be joined with the AlloyDB data
It offload filtering and joins to the source Oracle database reducing network and CPU overhead
Network configuration between AlloyDB and Oracle can be done in different ways but using native Google cloud services is probably the way to go.
The configuration is fairly simple and requires only a few steps.
There are two main approaches to create the representation layer of the remote data in AlloyDB.
Security considerations should be taken into account to prevent unauthorized data access.

Try it right now by enabling extension and configuring remote Oracle database(s) servers in the AlloyDB using the publicly available codelab.