Masking PII Entities within Snowflake

Photo by Ilya Pavlov on Unsplash

There may be requirements to mask entities within string values as part of a Masking Policy — in scenarios where you do not have explicit columns for PII data but rather have a comment value that some users may need to read, but contains PII data that should be masked. For example;

My name is Joe Bloggs, and my social is 112–22–3443. I need someone to call me as I’m really not very happy with the service. If someone needs to contact me then please call my cell phone (555) 221–8373. Also in case you were wondering I was born on 03/25/1988.

For restricted users, we may want something like this:

My name is XXXXXXXXXX, and my social is XXXXXXXXXXX. I need someone to call me as I’m really not very happy with the service. If someone needs to contact me then please call my cell phone XXXXXXXXXXXXXX. Also in case you were wondering I was born on XXXXXXXXXX.

This isn’t possible right now with Masking Policies out of the box, since we have no way of detecting the entities within the String.

As of the date of this article in the medium term, this will be a great use for Java UDFs using a Named-Entity Recognition (NER) model, however, since we’re a few months away from GA we can lean on an alternative — External Functions.

Using an NLP (Natural Language Processing) tool such as AWS Comprehend, we can build an External Function that can be used within a Masking Policy.

(This is a similar architecture pattern to External Tokenization, but instead of leaning on the external service to detokentize, we are using it to detect entities and obfuscate them).

AWS Lambda Code

Snowflake DDL

The following is the DDL to create the API Integration and External Function (AWS/Azure config will be needed also):

USE ROLE ACCOUNTADMIN;

CREATE OR REPLACE API INTEGRATION PII_API
API_PROVIDER = AWS_API_GATEWAY
API_AWS_ROLE_ARN = 'arn:aws:iam::000000000000:role/snowflake-pii-role'
ENABLED = TRUE
API_ALLOWED_PREFIXES = ('https://aaaaaaaaaa.execute-api.us-west-2.amazonaws.com');

(Note the ARN and API strings have been cleansed).

Once all the AWS components have been reswizzled, you will need to create one (or both) of the following external functions;

CREATE OR REPLACE EXTERNAL FUNCTION MASK_PII_ENTITIES(CHECK_STRING VARCHAR,LOCALES VARCHAR)
RETURNS VARCHAR
API_INTEGRATION = PII_API
AS 'https://aaaaaaaaaa.execute-api.us-west-2.amazonaws.com/dev/';

Note — we are returning a VARCHAR this time, instead of a VARIANT.

Now, if we were to look at the response payload directly from AWS Comprehend it would look like the following:

{
'Entities': [
{
'Score': 0.9999415874481201,
'Type': 'NAME',
'BeginOffset': 11,
'EndOffset': 21
},
{
'Score': 0.9999970197677612,
'Type': 'SSN',
'BeginOffset': 40,
'EndOffset': 51
},
{
'Score': 0.9968423843383789,
'Type': 'PHONE',
'BeginOffset': 188,
'EndOffset': 202
},
{
'Score': 0.9999982714653015,
'Type': 'DATE_TIME',
'BeginOffset': 250,
'EndOffset': 260
}
],
'ResponseMetadata':
{
'RequestId': 'b9743c11-a8fe-424b-a76a-dbe4d7216a63',
'HTTPStatusCode': 200,
'HTTPHeaders':
{
'x-amzn-requestid': 'b9743c11-a8fe-424b-a76a-dbe4d7216a63',
'content-type': 'application/x-amz-json-1.1',
'content-length': '323',
'date':
'Fri, 30 Apr 2021 13:48:57 GMT'
},
'RetryAttempts': 0
}
}

We can see there is a dictionary of Entities in which there are 4 rows, one for each entity that is detected/ Additionally you’ll notice there is a score as to the certainty that this is the entity it has detected.

This demo is relatively simple, in that it is applying the obfuscation regardless of score — but this is something you could factor into the function i.e a score threshold as to whether the entity should be masked.

So, at this stage, we now have a function that will mask our data when executed with a string. If we use the above example it will behave like so:

SELECT MASK_PII_ENTITIES('My name is Joe Bloggs, and my social is 112-22-3443. I need someone to call me as I\'m really not very happy with the service. If someone needs to contact me then please call my cell phone (555) 221-8373. Also in case you were wondering I was born on 03/25/1988.','en');

This returns:

Masking Policy

Now we want to incorporate into a masking policy, so first we create the masking policy:

CREATE MASKING POLICY MASK_PII AS (VAL VARCHAR) RETURNS VARCHAR ->
CASE
WHEN CURRENT_ROLE() IN ('PII_VIEWER') THEN VAL
ELSE MASK_PII_ENTITIES(VAL,'EN')
END;

You’ll notice we call out External Function in the ELSE clause. Now we need to assign to a table, lets assume we have a simple table like so:

CREATE TABLE SERVICE_TICKET 
(
TICKET_ID NUMBER,
CUSTOMER_ID VARCHAR,
CUST_COMMENTS VARCHAR
);

In which we have a free-text field the customer may have entered in PII Data that we want to mask, now we add the masking policy to our column:

ALTER TABLE SERVICE_TICKET ALTER COLUMN CUST_COMMENTS SET MASKING POLICY MASK_PII;

Now, if we were to select data from the table with the PII_VIEWER role you can see we can see the data:

However, if we use any other role you will see that the PII Entities are now masked:

--

--