Oracle Developers
Published in

Oracle Developers

Analyze Bank Transaction Data using Graph (Part 2/3)

Photo by Eduardo Soares on Unsplash (edited)

In Part 1, I showed how to use the graph database API (PGQL as an alternative to SQL) to represent bank transaction data as a graph, and how to add nodes and edges and search against them using pattern matching. We’ll now look at a different approach: creating a graph as a view on existing database tables. We’ll also perform some searches and visualize the results.

Should we manage data as a graph?

We have seen that pattern matching syntax can simplify queries, but perhaps you haven’t seen much benefit in adding data as nodes and edges into a database. One of the reasons is that, in this use case, the types of nodes and edges are implicitly determined from the beginning. We don’t need the flexibility to add any types of nodes and edges.

This is a common case when we discuss the application of graph databases. There are logical-layer reasons that a fixed database schema is desirable for application development, and there are physical-layer reasons that table indexing and scanning are more efficient in aggregation.

Create a graph as a view (PG view)

In such cases, the best way to utilize graphs (mainly for search and analysis) is to use tables and graphs together, and actually, graph databases have been used with RDBMS till now. Because of this, Oracle Graph provides a declarative mapping from tables to graphs using the CREATE PROPERTY GRAPH statement, plus the ability to create graphs as views on top of the table data. With this idea, we can avoid additional ETL processing and duplication of data.

So let’s try it out. First, we will create standard tables.

DROP TABLE account;
DROP TABLE customer;
DROP TABLE transaction;
CREATE TABLE account (
acc_id NUMBER NOT NULL
, cst_id NUMBER NOT NULL
, CONSTRAINT account_pk PRIMARY KEY (acc_id)
);
CREATE TABLE customer (
cst_id NUMBER NOT NULL
, first_name VARCHAR2(255)
, last_name VARCHAR2(255)
, CONSTRAINT customer_pk PRIMARY KEY (cst_id)
);
CREATE TABLE transaction (
acc_id_src NUMBER
, acc_id_dst NUMBER
, txn_id NUMBER
, datetime TIMESTAMP
, amount NUMBER
, CONSTRAINT transaction_pk PRIMARY KEY (txn_id)
);
EXIT

Save the data files using the INSERT statement, account.sql, customer.sql, transaction.sql (right-click and select Save Link As..) and load them into the database. If you prefer loading CSV data, please use account.csv, customer.csv, transaction.csv.

SQL> @account.sql
SQL> @customer.sql
SQL> @transaction.sql

Log in to the database with SQLcl and switch to PGQL mode.

SQL> PGQL AUTO ON

Now for the most important point in this article: how to write a mapping from a table to a graph. For the graph, we need to identify and extract information about vertices, edges, and labels from our tables. The paragraphs below for VERTEX TABLES and EDGE TABLES specify which tables serve as the source of information for vertices and edges, respectively, and their labels in the newly created graph.

Here, the vertex description is straightforward, as it simply creates “customer” vertices from the customer table and “account” vertices from the account table. For edges, we label the relationships held in the transaction table as “transferred_to”. And from the account table, we extract and label the relationships of which customer “owns” the account.

CREATE PROPERTY GRAPH graph2
VERTEX TABLES (
graphuser.customer
KEY (cst_id)
LABEL customer
PROPERTIES (cst_id, first_name, last_name)
, graphuser.account
KEY (acc_id)
LABEL account
PROPERTIES (acc_id)
)
EDGE TABLES (
graphuser.transaction
KEY (txn_id)
SOURCE KEY(acc_id_src) REFERENCES account
DESTINATION KEY(acc_id_dst) REFERENCES account
LABEL transferred_to
PROPERTIES (txn_id, datetime, amount)
, graphuser.account
KEY (acc_id)
SOURCE KEY(cst_id) REFERENCES customer
DESTINATION KEY(acc_id) REFERENCES account
LABEL owns
)
OPTIONS (PG_VIEW)
;

Finally, by adding OPTIONS (PG_VIEW) at the end, we can create this graph as a view. Without this option, a new graph will be created in a set of tables as a schema-less graph (in the structure we call the PG schema), just like the graph created in Part 1. However, I recommend using views unless you need to handle the graph in a schema-less manner, as it allows you to maintain the graphs without duplicating data.

Let’s check that the Graph Visualization tool can access this graph. Since this view object is stored in a database, select Database as the connection destination from the Advanced Options.

Let’s try a PGQL query by selecting GRAPH2 as the graph to be queried.

Save settings.json (right-click and select Save Link As..) and upload it from the Load Settings button, so you can change the icon as follows.

Search and Visualize using PGQL

Now, let’s try some PGQL queries to search and visualize the results. Please note that we can skip ON graph2 in the queries using the Graph Visualization tool since the target graph is already specified.

First, we will check the accounts held by the customer with cst_id = 10.

SELECT *
FROM MATCH (c:customer)-[e:owns]->(a:account) ON graph2
WHERE c.cst_id = 10

We can see that this customer’s name is Laura, and she owns two accounts.

Next, let’s show all of Laura’s transfers. We will add one more hop to the matching pattern. Since there are two accounts in the pattern, we assign different variables for them, like a and a1.

SELECT *
FROM MATCH (c:customer)-[e:owns]->(a:account)-[t:transferred_to]->(a1:account) ON graph2
WHERE c.cst_id = 10

You can see that Laura has transferred money to some accounts using both of the two accounts.

Now, focusing only on the transfers between accounts, let’s look for a pattern starting from the account with acc_id = 10 and returning to the same account in 3 hops. Furthermore, we can add a condition that these transfers happened in this order in time, and the amount of money transfer must always be more than 500.

SELECT *
FROM MATCH (
(a1)-[t1:transferred_to]->(a2)-[t2:transferred_to]->(a3)
, (a3)-[t3:transferred_to]->(a1)
) ON graph2
WHERE a1.acc_id = 10
AND t1.amount > 500 AND t2.amount > 500 AND t3.amount > 500
AND t1.datetime < t2.datetime AND t2.datetime < t3.datetime

In the whole graph, this pattern is detected only in the paths that include the accounts with acc_id = 10, 75, and 77.

Similarly, we will try it for 4 hops. The query is getting longer, but it is still much more compact than it would be in SQL.

SELECT *
FROM MATCH (
(a1)-[t1:transferred_to]->(a2)-[t2:transferred_to]->(a3)
, (a3)-[t3:transferred_to]->(a4)-[t4:transferred_to]->(a1)
) ON graph2
WHERE a1.acc_id = 10 AND ALL_DIFFERENT(a1, a2, a3, a4)
AND t1.amount > 500 AND t2.amount > 500
AND t3.amount > 500 AND t4.amount > 500
AND t1.datetime < t2.datetime AND t2.datetime < t3.datetime
AND t3.datetime < t4.datetime

Again, the pattern is detected only in one combination in the case of 4 hops.

It is also possible to aggregate using GROUP BY. For example, let’s look for accounts that received a large number of small remittances (less than 500) during the period between Oct 1 and Dec 1.

SELECT a2.acc_id AS acc_id, COUNT(a2) AS num_of_txn
FROM MATCH (a1)-[t:transferred_to]->(a2) ON graph2
WHERE t.datetime >= TIMESTAMP '2020-10-01 00:00:00'
AND t.datetime < TIMESTAMP '2020-12-01 00:00:00'
AND t.amount < 500.00
GROUP BY a2 ORDER BY num_of_txn DESC LIMIT 10

However, the aggregation results cannot be represented as a graph, so you will have to show them in a table.

The account with acc_id = 23 has received 29 transfers, so let’s show them.

SELECT *
FROM MATCH (a1)-[t:transferred_to]->(a2)<-[o:owns]-(c) ON graph2
WHERE t.datetime >= TIMESTAMP '2020-10-01 00:00:00'
AND t.datetime < TIMESTAMP '2020-12-01 00:00:00'
AND t.amount < 500.00
AND a2.acc_id = 23

You can confirm the transfers.

What do you think? You can see that transactional data such as bank transfers stored in a table can be managed as a graph by simply creating a view using the CREATE PROPERTY GRAPH statement. With this approach, the graph definition makes the table joins more restricted so that we can keep PGQL queries intuitive and compact.

On the other hand, in the PG view mechanism, PGQL is converted to SQL and executed, so we cannot expect the use of PGQL would improve query performance. Running the graph algorithm on top of this mechanism is not efficient either. In Part 3, we will introduce a 3-tier architecture option to overcome these challenges.

If you would like to share your feedback or ask questions to the community, please post to Stack Overflow with oracle-graph tag, or click here to join the Slack workspace of AnDOUC (Analytics and Data Oracle User Community) and post your comments at the #graph channel.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store