Snowflake
Published in

Snowflake

How to use Snowflake external functions for custom data encryption

In one of the older posts related to increasing Snowflake security I briefly mentioned Snowflake external functions. They are available in public preview since June 2020. We use this feature for integration with AWS KMS. Key management store generates the encryption keys for our custom data encryption which is the key part of improved architecture in terms of security. Now we have unique encryption key per customer and using KMS for key maintenance. Let’s have a look on how to integrate Snowflake and AWS via external functions. Thanks to such integration it is possible to leverage features of so many AWS services directly in Snowflake!

External functions allow Snowflake to run executable code which has been developed and runs outside the Snowflake, typically in some serverless compute service like AWS Lambda. It extends Snowflake capabilities dramatically because now you can have extensive logic which has been built in your favorite programming language and it runs „behind the scenes“ while the result can be part of your SQL queries in Snowflake. This can bring totally new use cases and suddenly it can be quite easy to do things directly in Snowflake. Without external functions you would need another tool or platform where you would perform such task. So it can also help you to make your target architecture lighter and easier to maintain.

First of all, let’s look on how to integrate Snowflake and AWS together. What kind of setup needs to be done on both sides.

Integration setup

Needed setup is described in Snowflake documentation in series of articles related to External Functions. There are certain steps which needs to be done on both sides: Snowflake and cloud provider platform (AWS in our case).

The setup is pretty straightforward at Snowflake side. We need to create two things:

API INTEGRATION

API INTEGRATION stores information about an HTTP proxy service (Amazon API Gateway) and it is some kind of „link“ between Snowflake and AWS. It also contains IAM role which has defined needed privileges on AWS side. API integration could be created either by ACCOUNTADMIN or role with the global CREATE INTEGRATION privilege.

Following example shows how to create API INTEGRATION:

create or replace api integration demo_external_api_integration
api_provider=aws_api_gateway
api_aws_role_arn='arn:aws:iam::123456789:role/my_cloud_account_role'
api_allowed_prefixes=('https://xyz.execute-api.us-west-2.amazonaws.com/production')
enabled=true;

External function

Below is demo code for external function creation which use our API integration object from previous step:

create or replace external function local_echo(string_col varchar)
returns variant
api_integration = demo_external_api_integration
as 'https://xyz.execute-api.us-west-2.amazonaws.com/production/remote_echo';

All the parameters could be found in docs.

Using External function for custom data encryption

Let’s have a look on our use case for external function. We want to integrate AWS KMS as a secure storage for our encryption keys and also use AWS KMS APIs to either create a new key for us or decode existing keys. Encryption keys will be used for custom data encryption on Snowflake side. To increase security we want to have encrypted data in tables. Yes, Snowflake stores all ingested data encrypted but this is additional level — our solution encrypts data directly in tables. Meaning that data can’t be read without decryption.

Each customer (business account) has its own key and data could be decrypted only per given customer key after user authorization. Then proper encryption key is retrieved based on customer account identifier. I described the whole security concept in blog post called Snowflake data apps security on steroids. I hope it can give you the context and better understanding of whole story. Now let’s focus only on part related to external function.

Below is illustration shows how the key creation and decryption process works:

External function functionality — high level schema

When new user sign-up for data application access, his identity information is passed through internal API from Internal CRM into USER_LIST table in Snowflake. There is data pipeline running on daily basis which verifies if all users have encryption keys or if there is any new one which need a key to be created. In that case external function KMS_CLIENT(<<user_id>>, ‘CREATE’) is called. user_id is passed to the function.

Here is KMS_CLIENT function code:

/* 
* ACCEPTS TWO ARGUMENTS:
* - ARG_0: THIS IS THE SUBJECT
* THE SUBJECT IS <USER_ID> WHEN CREATING A DATA KEY
* THE SUBJECT IS <KEY_BLOB> WHEN DECODING A DATA KEY
* - OPCODE: THIS IS THE VERB TO ACT ON THE SUBJECT
* SUPPORTED VERBS ARE: CREATE, DECODE
*/
CREATE OR REPLACE EXTERNAL FUNCTION KMS_CLIENT(ARG_0 VARCHAR, OP_CODE VARCHAR)
RETURNS VARIANT
API_INTEGRATION = MY_API_INTEGRATION
AS 'MY_API_ALLOWED_PREFIX'

How does processing on AWS side looks? There is Lambda function which process this call. Lambda does either key creation or key decoding. It is triggered by API Gateway which is defined also in API integration on Snowflake side.

Lambda part of external function — trigger

As I already said, Lambda uses KMS boto3 API to create a key:

client = boto3.client('kms')
array_of_rows_to_return = [ ]
try:
event_body = event["body"]
payload = json.loads(event_body)
rows = payload["data"]
row = rows[0]
row_number = row[0]
user_id = row[1]
opcode = row[2]
keyId = os.environ['MY_APP_CMK']
output_value = []

if opcode == 'CREATE':
response = client.generate_data_key(
KeyId=keyId,
KeySpec='AES_256'
)
print("> Create OK")
output_value = [user_id, response["KeyId"], response["Plaintext"].hex(), response["CiphertextBlob"].hex()]
row_to_return = [row_number, output_value]
array_of_rows_to_return.append(row_to_return)
json_compatible_string_to_return = json.dumps({"data" : array_of_rows_to_return})# Return the return value and HTTP status code.
return {
'statusCode': 200,
'body': json_compatible_string_to_return
}

You can see that for user_id = 201 the Lambda and thus external function in Snowflake returns a JSON (VARIANT object on Snowflake side) containing user_id, kms_key_id, data_key and CipherTextBlob. This information is then stored in our BINARY_TABLE and used for customer data encryption and decryption.

Key details are not stored permanently in BINARY_TABLE, respectively not all of them. We keep there only the keyblob value for each user_id. Data_key itself is retrieved from KMS dynamically before daily ETL starts — to encrypt the current data. Once data are loaded data_key value is removed from Snowflake BINARY_TABLE.

Data decryption

We haven’t mentioned data decryption part yet. Users access the data through the Apache Superset portal. During authorization process the data_key is retrieved from KMS and stored in user session. Here is Lambda part for decode the key from blob value:

if opcode == 'DECODE':
blob = bytes.fromhex(user_id)
response = client.decrypt(
CiphertextBlob=blob,
KeyId=keyId
)
print("> Decode OK")
output_value = ['', response["KeyId"], response["Plaintext"].hex(), '']

Data are decrypted on the fly when user opens particular dashboard. Key value is passed through snowflake secure view to UDF to decrypt the needed data. Whole process with secure views and data sharing as additional layer of data app security is described in detail in my following blog post. I will share the high level diagram of decryption process also here:

Wrap up

We’ve described the use case for Snowflake external function. We’ve gone through the setup and integration with AWS KMS. We showed how we can use KMS as a secure encryption key storage and increase our data security by additional data encryption directly in our tables with data segregation per user account. I hope it has showed you how external functions could be used in Snowflake and what kind of new opportunities it brings. There is plenty of more use cases where you can use external functions including some ML scoring, translation services, various alerts and notifications...

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tomáš Sobotík

Tomáš Sobotík

281 Followers

Lead data engineer @Tietoevry. Currently obsessed by cloud technologies and solutions in relation to data & analytics. ☁️ ❄️