Fun with Data Redaction and Operational Property Graphs
Let’s explore the usage of Data Redaction (part of Advanced Security Option) and Operational Property Graphs in Oracle Database 23ai.
Be warned that using Data Redaction requires the licensing of Advanced Security Option, which is available in Oracle Database Enterprise Edition, but requires a separate license.
The good thing about Oracle Database 23ai Free is that all this functionality is included and can be tested with no risk for free in development environments.
Let’s start by taking some definitions for these two features from the documentation, just in case the reader doesn’t know them.
What is Data Redaction or Dynamic Data Masking?
Data Redaction or Dynamic Data Masking is the process of obfuscating or hiding sensitive data elements such as Credit Card Numbers in the SQL query results prior to display by applications. For example, a credit card
number 5105–1051–0510–5100 can be redacted to xxxx-xxxx-xxxx-5100. Sensitive data is redacted on-the-fly without changing actual data stored in the database.
The new SQL Standard for Property Graphs SQL:2023 includes SQL syntax for property graphs (defined by the ISO standard ISO/IEC 9075–16). Oracle Database 23ai implements this new syntax so that graph operations can be executed using SQL.
Operational Property Graphs with SQL in Oracle Database 23ai.
The new SQL:2023 GRAPH_TABLE function and MATCH clause syntax, implemented in Oracle Database 23ai, enable you to write simple SQL queries to follow connections in data. You can create a graph from data in relational tables and run graph queries to easily traverse connections in data. You can use graphs in transactional and analytical workloads, just like any other data in the database.
With Oracle Database you can combine different functionalities in the way that best fit your requirements. For example, there could be a situation where some transactional data must be modeled and queried as a property graph, but some of the transactional data is sensitive and must not be visible by the data analyst. This can be solved by using Data Redaction policies, that will allow the analysis to be done without exposing too much information.
Ideally the redaction policies should be configured in first place if you don’t want to allow the data to be read, but they can be also created after modeling the property graph, and their restrictions will apply as soon as they are created.
Creating the user and the tables
Let’s start from scratch, creating a new user with some privileges. This has to be done as DBA user, for this simple exercise in the default freepdb1
pluggable database that comes with the 23ai installation.
create user graphuser identified by "ComplexPa$$w0rd";
grant connect to graphuser;
grant resource to graphuser;
grant create property graph to graphuser;
alter user graphuser default role connect, resource;
alter user graphuser quota unlimited on users;
Now let’s create a couple of simple tables and populate some rows. For simplicity sake I’m using an employees and departments table, to create a very simple property graph showing which department each employee belongs to.
create table dept (
deptno number(2) constraint pk_dept primary key,
dname varchar2(14),
loc varchar2(13)
) ;
create table emp (
empno number(4) constraint pk_emp primary key,
ename varchar2(10),
job varchar2(9),
mgr number(4),
hiredate date,
sal number(7,2),
comm number(7,2),
deptno number(2) constraint fk_deptno references dept
);
create index emp_dept_fk_i on emp(deptno);
insert into dept values (10,'ACCOUNTING','NEW YORK');
insert into dept values (20,'RESEARCH','DALLAS');
insert into dept values (30,'SALES','CHICAGO');
insert into dept values (40,'OPERATIONS','BOSTON');
insert into dept values (50,'STORAGE','LOS ANGELES');
insert into dept values (60,'HEADQUARTER','PHOENIX');
insert into dept values (70,'LOGISTICS','SAN DIEGO');
insert into dept values (80,'RETAIL','SEATTLE');
insert into emp values (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,null,20);
insert into emp values (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
insert into emp values (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
insert into emp values (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,null,20);
insert into emp values (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
insert into emp values (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,null,30);
insert into emp values (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,null,10);
insert into emp values (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,null,20);
insert into emp values (7839,'KING','PRESIDENT',null,to_date('17-11-1981','dd-mm-yyyy'),5000,null,10);
insert into emp values (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
insert into emp values (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,null,20);
insert into emp values (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,null,30);
insert into emp values (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,null,20);
insert into emp values (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,null,10);
insert into emp values (7544,'MICHAEL','ANALYST',7698,to_date('8-9-1982','dd-mm-yyyy'),1500,0,50);
insert into emp values (7996,'HENRY','CLERK',7788,to_date('13-JUL-83', 'dd-mm-rr')-51,2100,null,60);
insert into emp values (7977,'MARK','CLERK',7698,to_date('3-12-1985','dd-mm-yyyy'),1950,null,60);
insert into emp values (7190,'ANDY','ANALYST',7566,to_date('3-12-1982','dd-mm-yyyy'),1000,null,70);
insert into emp values (7230,'DOUG','CLERK',7782,to_date('23-2-1984','dd-mm-yyyy'),2000,null,70);
insert into emp values (7276,'SANJAY','CLERK',7782,to_date('23-2-1984','dd-mm-yyyy'),1000,null,80);
commit;
At this point, when the user graphuser
queries these tables, it can see all the data, as no redaction policy has been yet created. For instance, let’s get the information encapsulated in a JSON document, one per department.
SQL> set pagesize 50
SQL> with tt as (
select json {
'_id' : d.deptno,
'departmentName' : d.dname,
'location' : d.loc,
'employees' :
[ select json {'employeeNumber' : e.empno,
'employeeName' : e.ename,
'job' : e.job,
'salary' : e.sal}
from emp e
where d.deptno = e.deptno ] } as j
from dept d fetch first row only)
select json_serialize(j PRETTY ORDERED)
FROM tt po; 2 3 4 5 6 7 8 9 10 11 12 13 14 15
JSON_SERIALIZE(JPRETTYORDERED)
--------------------------------------------------------------------------------
{
"_id" : 10,
"departmentName" : "ACCOUNTING",
"location" : "NEW YORK",
"employees" :
[
{
"employeeName" : "CLARK",
"employeeNumber" : 7782,
"job" : "MANAGER",
"salary" : 2450
},
{
"employeeName" : "KING",
"employeeNumber" : 7839,
"job" : "PRESIDENT",
"salary" : 5000
},
{
"employeeName" : "MILLER",
"employeeNumber" : 7934,
"job" : "CLERK",
"salary" : 1300
}
]
}
Creating the redaction policies
Let’s create a couple of simple policies, one per table. These policies must be created as DBA. These policies are applied depending on the conditions in the expression
parameter, but to keep it very simple let’s make them active always (1=1
). Check the redaction documentation in the Advanced Security Guide, for more information on the different redaction functions and ways to apply them depending on different conditions.
First on the EMP
table, let’s redact the sal
and ename
columns containing salary and employee name by creating a policy named redact_emp_sal_ename_pol
. It will replace the salary with 0 and replace the first 4 characters of the name with asterisks.
begin
dbms_redact.add_policy(
object_schema => 'graphuser',
object_name => 'EMP',
column_name => 'SAL',
policy_name => 'redact_emp_sal_ename_pol',
function_type => DBMS_REDACT.RE_REDACT_WITH_SINGLE_1,
expression => '1=1');
end;
/
begin
dbms_redact.alter_policy (
object_schema => 'graphuser',
object_name => 'EMP',
policy_name => 'redact_emp_sal_ename_pol',
action => dbms_redact.add_column,
column_name => 'ENAME',
function_type => dbms_redact.partial,
function_parameters => 'VVVVVVVVVV,VVVVVVVVVV,*,1,4'
);
end;
/
Now on the DEPT
table, let’s redact the name of the departments with a policy named redact_dept_dname_pol
.
begin
dbms_redact.add_policy(
object_schema => 'graphuser',
object_name => 'DEPT',
column_name => 'DNAME',
policy_name => 'redact_dept_dname_pol',
function_type => DBMS_REDACT.PARTIAL,
function_parameters => 'VVVVVVVVVV,VVVVVVVVVV,*,1,4',
expression => '1=1');
end;
/
Let’s see the redaction in action, connecting with graphuser
and running the same query used before to build a JSON per department:
SQL> set pagesize 50
SQL> with tt as (
select json {
'_id' : d.deptno,
'departmentName' : d.dname,
'location' : d.loc,
'employees' :
[ select json {'employeeNumber' : e.empno,
'employeeName' : e.ename,
'job' : e.job,
'salary' : e.sal}
from emp e
where d.deptno = e.deptno ] } as j
from dept d fetch first row only)
select json_serialize(j PRETTY ORDERED)
FROM tt po; 2 3 4 5 6 7 8 9 10 11 12 13 14 15
JSON_SERIALIZE(JPRETTYORDERED)
--------------------------------------------------------------------------------
{
"_id" : 10,
"departmentName" : "****UNTING",
"location" : "NEW YORK",
"employees" :
[
{
"employeeName" : "****K",
"employeeNumber" : 7782,
"job" : "MANAGER",
"salary" : 0
},
{
"employeeName" : "****",
"employeeNumber" : 7839,
"job" : "PRESIDENT",
"salary" : 0
},
{
"employeeName" : "****ER",
"employeeNumber" : 7934,
"job" : "CLERK",
"salary" : 0
}
]
}
Now department and employee name are redacted: the first 4 characters have been replaced with ‘*’, and the salary has been replaced with ‘0’. The data in the underlying tables hasn’t been modified, but for the graphuser
user, these redaction policies apply at query time modifying the data on the fly.
Creating the property graph
Let’s create the property graph DDL. This kind of creation sentence can be now executed directly in SQLplus in 23ai.
SQL> create property graph connections_pg
vertex tables (
dept as department
key (deptno)
label department
properties all columns,
emp as employee
key(empno)
label employee
properties all columns
)
edge tables (
emp
key (empno)
source key (empno) references employee (empno)
destination key (deptno) references department (deptno)
label connection
no properties
); 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Property graph created.
This DDL created the property graph metadata for a graph named connections_pg
having departments and employees as nodes, with the edges connecting each employee to the department it belongs to. For the nodes tables (DEPT
and EMP
) all the columns are mapped to key-value pairs, and here is where the redaction will come into play when querying the graph.
Using the result
First, let’s run a simple query to get a list of employees, with their salary (this is a property of the employee node) and the department they are connected to. Employee and department names are also properties of each node. These three properties come from three redacted columns, so the results of the graph query, will show them redacted.
Below you can see the new GRAPH_TABLE
and MATCH
clauses in action, with the graph pattern specifying the node and edge type.
SQL> select employee, salary, department
from graph_table (connections_pg
match
(e is employee) -[c is connection]-> (d is department)
columns (e.ename as employee,
e.sal as salary,
d.dname as department)
)
order 2 by 1; 3 4 5 6 7 8 9
EMPLOYEE SALARY DEPARTMENT
---------- ---------- --------------
****S 0 ****ARCH
****N 0 ****S
**** 0 ****STICS
****E 0 ****S
****K 0 ****UNTING
**** 0 ****STICS
**** 0 ****ARCH
****Y 0 ****QUARTE
****S 0 ****S
****S 0 ****ARCH
**** 0 ****UNTING
**** 0 ****QUARTE
****IN 0 ****S
****AEL 0 ****AGE
****ER 0 ****UNTING
****AY 0 ****IL
****T 0 ****ARCH
****H 0 ****ARCH
****ER 0 ****S
**** 0 ****S
20 rows selected.
Finally, let’s run this query using GraphViz to get a visual interactive representation of the property graph. This tool is part of Oracle Graph Server; it allows to run queries on property graphs that exist in the database. As our graph is under some redaction policies affecting some nodes properties, the visualization will show redacted data instead of the real values.
So, let’s log in, run a query and make some configurations to show the redacted properties.
Let’s inspect an employee node to get a list of its properties, where the two redacted ones can be seen (name and salary).
And the same for a department, where its name is redacted.
In this article we had some fun mixing data redaction policies and operational property graphs in Oracle Database 23ai. This is just an example of how relational data can be modeled as a graph and how some sensitive data can be redacted so that the graph analyst can have the information required without compromising sensitive data.
Remember that when using redaction, the data is altered during query time, but it remains unaltered in the storage.