Detect Conflicts of Interest using Graphs
There are many patterns in conflict-of-interest transactions. Let us consider a case in which an employee of an organization approves a transaction with a company to which they are related. For example, if an employee of a company purchases office equipment from a store owned by their family for a higher price, this would be a loss for the company. If this happens in government or public services, it needs to be investigated as potential corruption.
Such “connections” are not always simple but they can involve multiple relationships. They may be intentionally hidden or may happen by accident. Therefore, if a system can detect transactions that can be a conflict of interest, it can protect both an organization and its employees from fraud and disadvantage.
In this article, we will show you how to detect such “connections” using graphs. We will show an example using Graph Studio for Oracle Autonomous Database (ADB), while the same scenario can be tried using all Oracle Databases. For more information on how to set up, see:
Source Dataset
Let us assume that the following data is available in tabular form. I added the prefix COI (= conflict of interest) to all files for convenience. Please right-click the links to save the files.
- COI_EMPLOYEE (Name, phone number, address, etc.)
- COI_SUPPLIER (Name, phone number, address, etc.)
- COI_PURCHASE (Invoice number, supplier, approver, etc.)
- COI_FORM_MASTER (Form ID and employee ID)
- COI_FORM_INTEREST (Form ID, interest type, supplier, etc.)
If you are using Autonomous Database (ADB), you can load the files above via Database Actions.
For replicating the following demo, please see here for more details.
Create a Graph
Oracle Graph provides the CREATE PROPERTY GRAPH syntax to create graphs from table datasets.
CREATE PROPERTY GRAPH graph1
VERTEX TABLES (
coi_employee AS employee
KEY (employee_id)
PROPERTIES ARE ALL COLUMNS EXCEPT (postal)
, coi_form_master AS form_master
KEY (form_id)
, coi_supplier AS supplier
KEY (company_id)
, coi_purchase AS purchase
KEY (invoice_no)
)
EDGE TABLES (
coi_form_interest_with_key
KEY (id)
SOURCE KEY(form_id) REFERENCES form_master
DESTINATION KEY(company_id) REFERENCES supplier
LABEL has_interest
, coi_purchase AS approved_by
KEY (invoice_no)
SOURCE KEY (invoice_no) REFERENCES purchase
DESTINATION KEY (approver_employee_id) REFERENCES employee
, coi_purchase AS supplied_by
KEY (invoice_no)
SOURCE KEY (invoice_no) REFERENCES purchase
DESTINATION KEY (company_id) REFERENCES supplier
, coi_form_master AS declared_by
KEY (form_id)
SOURCE KEY (form_id) REFERENCES form_master
DESTINATION KEY (employee_id) REFERENCES employee
)
Running this statement on a Graph Studio notebook will pull the data from database tables and create a graph in memory.
%python-pgx
statement = """
CREATE PROPERTY GRAPH coi_graph1
VERTEX TABLES (
...
)
EDGE TABLES (
...
)
"""
session.prepare_pgql(statement).execute()
A graph consisting of 3800 vertices and 4759 edges was created.
%python-pgx
graph1 = session.get_graph("COI_GRAPH1")
graph1
PgxGraph(name: COI_GRAPH, v: 3800, e: 4759, directed: True, memory(Mb): 3)
Pattern Matching
Once you have a graph, you can try PGQL queries for pattern matching.
SELECT *
FROM MATCH (s1)<-[r1:has_interest]-(d)-[r2:declared_by]->(e)<-[r3:approved_by]-(p)-[r4:supplied_by]->(s2) ON coi_graph1
LIMIT 1
If a pattern is too long in a single line, you can break it up into two lines. Here, the variables s1 and s2 may or may not be the same supplier, but variable e, which appears twice in the pattern, represents the same employee.
SELECT *
FROM MATCH (
(s1)<-[r1:has_interest]-(d)-[r2:declared_by]->(e),
(e)<-[r3:approved_by]-(p)-[r4:supplied_by]->(s2)
) ON coi_graph1
LIMIT 1s
In the Graph Studio notebook, we can accept user input for specifying values in the queries. Here, the given employee “Cory Sims” and surrounding information are retrieved.
SELECT *
FROM MATCH (
(s1)<-[r1:has_interest]-(d)-[r2:declared_by]->(e),
(e)<-[r3:approved_by]-(p)-[r4:supplied_by]->(s2)
) ON coi_graph1
WHERE e.first_name = '${First name}' AND e.last_name = '${Last name}'
In this case, Cory Sims has declared a personal interest against Zachary Digital, so the system may need to raise an alert for the purchases from Zachary Digital.
It is also possible to detect all cases which contain a specific pattern. To represent this 4-hop cyclic, we can use the vertex (s) for both the start and end of the pattern.
SELECT *
FROM MATCH (
(s1)<-[r1:has_interest]-(d)-[r2:declared_by]->(e),
(e)<-[r3:approved_by]-(p)-[r4:supplied_by]->(s2)
) ON coi_graph1
LIMIT 3
This query returns 3 cases.
Add New Relationships
Next, let us consider how to add new edges to the graph when we find out hidden relationships between employees or suppliers. We can often detect relationships based on the property information of employees and suppliers. For example, if an employee and a supplier share the same phone number (or emergency contact phone number), we can assume that there is any connection between them.
Now let us think of an efficient way to create such relationships as new edges. Since the phone number information is held as a property of the employee vertex (or supplier vertex), we can run a PGQL query (INSERT EDGE syntax) to add such edges based on the properties.
At the same time, the most efficient way to find all matching values (in this case, phone numbers) is, use the table scan and join operation. If the source data of the graph is stored in tables, then the relationship can be defined as a new view over the tables in RDBMS. Remember to create indexes on the columns for table join.
CREATE VIEW coi_same_number_1 AS
SELECT ROWNUM AS id, e1.employee_id AS e1_id, e2.employee_id AS e2_id
FROM coi_employee e1, coi_employee e2
WHERE e1.EMERGENCY_CONTACT_PHONE = e2.phone;
CREATE VIEW coi_same_number_2 AS
SELECT ROWNUM AS id, e.employee_id, s.company_no
FROM coi_employee e, coi_supplier s
WHERE e.EMERGENCY_CONTACT_PHONE = s.phone;
Once the views are prepared, modify the CREATE PROPERTY GRAPH statement, so that graph2 will contain the new relationships.
CREATE PROPERTY GRAPH graph2
VERTEX TABLES (
...
)
EDGE TABLES (
...
, coi_emergency_contact_1
KEY (id)
SOURCE KEY(employee_id_1) REFERENCES employee
DESTINATION KEY(employee_id_2) REFERENCES employee
LABEL emergency_contact
NO PROPERTIES
, coi_emergency_contact_2
KEY (id)
SOURCE KEY(employee_id) REFERENCES employee
DESTINATION KEY(company_id) REFERENCES supplier
LABEL emergency_contact
NO PROPERTIES
)
Now you can see the new relationship emergency_contact is added.
Relationships based on geospatial distance
Using the method described in the previous section, we can create various types of edges based on the existing information.
Let us now consider the geospatial distance between the employee and supplier addresses. If their addresses are very close to each other, they may be in personal relationships such as relatives or neighborhood friends.
To calculate distances efficiently, geometry columns (= table columns containing special objects optimized for geometry information) are added based on the latitude and longitude information of employees and suppliers, and spatial indexes are also created for those columns.
CREATE TABLE coi_employee_geom AS
SELECT
t.*,
SDO_GEOMETRY(
2001, -- Geometry type (two-dementional point)
4326, -- Coordinate system ID (SRID)
SDO_POINT_TYPE(t.address_lon, t.address_lat, NULL), NULL, NULL
) AS address_geom
FROM coi_employee t;
INSERT INTO user_sdo_geom_metadata VALUES (
'coi_employee_geom',
'address_geom',
SDO_DIM_ARRAY(
SDO_DIM_ELEMENT('LONGITUDE', -180.0, 180.0, 0.05),
SDO_DIM_ELEMENT('LATITUDE', -90.0, 90.0, 0.05)
),
4326
);
COMMIT;
CREATE INDEX idx_coi_employee_geom
ON coi_employee_geom (address_geom)
INDEXTYPE IS mdsys.spatial_index_v2;
Using those indexes, we can define a view listing employee-supplier pairs “within 500 meters”.
CREATE VIEW coi_within_500m AS
SELECT
ROWNUM AS id,
e.employee_id,
s.company_id,
SDO_GEOM.SDO_DISTANCE(e.address_geom, s.address_geom, 0.005, 'unit=M') AS dist
FROM
coi_employee_geom e,
coi_supplier_geom s
WHERE
SDO_GEOM.SDO_DISTANCE(e.address_geom, s.address_geom, 0.005, 'unit=M') < 500;
CREATE PROPERTY GRAPH graph2
VERTEX TABLES (
...
)
EDGE TABLES (
...
, coi_within_500m
KEY (id)
SOURCE KEY(employee_id) REFERENCES employee
DESTINATION KEY(company_id) REFERENCES supplier
LABEL within_500m
)
You can see that a new edge type within_500m has been added. In this case, Derek has approved purchases twice from Kimberly Apparel, whose address is close in distance.
It is also possible to display them on a map.
Cyclic Paths with Unknown Length
PGQL has a syntax for path-finding that allows Oracle Graph to efficiently find the shortest paths by applying algorithms (such as Dijkstra’s algorithm) to the graphs on memory.
For example, this query returns the shortest path between two employees.
SELECT a1, b, a2
FROM MATCH TOP 1 SHORTEST (e1) (-[r]-(v))+ (e2) ONE ROW PER STEP (a1, b, a2) ON graph2
WHERE e1.employee_id = '${Employee 1}' AND e2.employee_id = '${Employee 2}'
AND COUNT(v) = COUNT(DISTINCT v)
This result shows the supplier of the two purchases approved by Kelly and Jack is listed on (someone’s) declaration form together. Path-finding is successful, but the result is not very useful in this case.
A few notes here:
- The pattern
(-[r]-(v))+
traces one or more edges in both directions. ONE ROW PER STEP (a1, b, a2)
decomposes the detected path into steps and maps them to the variablesa1, b, a2
.- The condition
COUNT(v) = COUNT(DISTINCT v)
is added to eliminate paths that walk through the same vertices multiple times.
Here, we can think of the case in the following figure. The pattern is a 5-hop loop that contains the newly added edge “same_number”. In the previous sections, we searched for cycles with a fixed number of hops, but in actual cases, the lengths of the cycles might be unknown.
So, let us use the path-finding syntax to detect cycles with unknown lengths. We should specify the target employee for both the start and end vertices, and search for a path that passes through the same edges only once.
SELECT a1, b, a2
FROM MATCH TOP 50 SHORTEST (e1) (-[r]-(v))+ (e2) ONE ROW PER STEP (a1, b, a2) ON graph2
WHERE e1.first_name = '${First name}' AND e1.last_name = '${Last name}'
AND e1 = e2
AND COUNT(v) = COUNT(DISTINCT v)
AND COUNT(r) = COUNT(DISTINCT r)
This query returns the 5-hop cycle, as expected. At the same time, another cycle containing a “within_500m” edge is also found. By using the path-finding technique, we can even detect cycles of unknown lengths.
Summary
We found that graphs can be used to represent the entities and activities for conflict-of-interest analysis and to detect cycles suggesting suspicious transactions. Also, by defining new relationships in the database, we could consider not only matching values (such as phone numbers), but also geospatial distances. Many SQL functions provide more techniques, for example, fuzzy string matching can deal with notational distortions. Finally, we saw that path-finding is a powerful capability to detect paths with unknown lengths.
This scenario can be reproduced in any Oracle Database (Oracle Graph is included in Oracle Database 12.2 and later). Also, no setup is required for Autonomous Database users as it includes the Graph Studio notebook out-of-the-box. The sample data and notebook are available here. Please try Autonomous Database for free with Oracle Cloud Always Free services.
Please learn more about Oracle Graph from:
- Medium (= all articles with Oracle Graph tag)
- Stack Overflow (= questions with oracle-graph tag)
- Slack AnDOUC (= invitation link, please visit #graph)
- Slack OracleDevRel (= invitation link, please visit #oracle-db-graph)
Want to talk about it? Join our public Slack and talk to real, live Oracle developers!