GCP — BigQuery — Data Security at rest (Part 3)

Murli Krishnan
Google Cloud - Community
6 min readNov 15, 2022

So far in the part 2 of the blog, we discussed in great length on the column level access control and dynamic masking of columns. In this blog, the focus will be on the encryption of column data.

This blog is a part of 5 part series on “Bigquery — Data Security at rest”.
Visit here for part 2 of the series
Visit here for the series menu

Bigquery Security Controls.

Column Level Encryption

The basic principle of column level encryption is the data is encrypted with Data Encryption Key (DEK) and the key is further encrypted by Key Encryption Key (KEK) and maintained in cloud KMS.

Now for the user to access the plain text, the user requires access on the table column plus the cloud KMS key (KEK) which is used to encrypt the Data Encryption Key (DEK).

Terminology
DEK — Data Encryption Key — Key which is used to encrypt the data, this key is generated by google
KEK — Key Encryption Key — Key which is used to wrap or encrypt the DEK and is maintained in cloud KMS

https://cloud.google.com/kms/docs/envelope-encryption#how_to_encrypt_data_using_envelope_encryption

Unlike the CMEK where the Bigquery service account only needs access on CMEK keys for encryption and decryption of datasets/tables.
For the column level encryption to work, the user needs access on the KMS key (KEK)

AEADAuthenticated Encryption with Associated Data.
The idea behind this encryption is to ensure that the encrypted data is not tampered with by verifying the signature with a provided associated data.

Associated Data can be simple bytes which is provided during the encryption process sort of like a password which needs to be provided during decryption process.

AEAD Encryption Process.

AEAD supports different kind of modes (key types) which are used to encrypt the data with AES (Advanced Encryption Standard) algorithm

Under the hood, Bigquery column level encryption uses Tink cryptographic library for encryption and decryption

Tink cryptographic library makes use of concepts called primitives, key types, keysets and keyset handles.

Primitives — Primitives in simple words can be considered as an interface for performing encryption and decryption. Primitive defines the key type and cryptographic algorithm to be used.
Eg: AEAD, AEAD Deterministic, MAC ..

Key Types — Key types defines the implementation. Every primitive has a supported set of key types to choose from
Eg: AEAD Primitive — AES GCM key type.

Key Sets — Keysets can be considered as group of keys combined together in which one key is the primary used for encryption and remaining keys are used for decryption

Key Set handles — Key Set handles helps in providing an interface with the keysets in secure way.

Lets see how does a plain text keyset generated by Tink looks like
The value is the actual key type material that is used to encrypt the data.

{
"primaryKeyId": 1920493436,
"key": [
{
"keyData": {
"typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
"value": "<redacted>",
"keyMaterialType": "SYMMETRIC"
},
"status": "ENABLED",
"keyId": 1920493436,
"outputPrefixType": "TINK"
}
]
}

The keyset should not be generated in clear plain text but should be encrypted with KMS system like cloud KMS

Now, lets try to use Bigquery to implement the above concepts with simple creation of tink keyset with a keytype

KEYSET Functions

Let’s try to create a keyset with AEAD_AES_GCM_256 keytype.


SELECT KEYS.NEW_KEYSET('AEAD_AES_GCM_256');
--COWSm/UNEmQKWAowdHlwZS5nb29nbGVhcGlzLm<redacted>

The NEW_KEYSET returns serialized byte representation of the tink key set.

Lets try to decode the serialized bytes to more recognizable format as below

SELECT KEYS.KEYSET_TO_JSON(
FROM_BASE64("COWSm/UNEmQKWAowdHlwZS5nb29nbGVhcGlzLm<readacted>"));
/*
{
"key":[
{
"keyData":{
"keyMaterialType":"SYMMETRIC",
"typeUrl":"type.googleapis.com/google.crypto.tink.AesGcmKey",
"value":"<redacted>"
},
"keyId":3735472485,
"outputPrefixType":"TINK",
"status":"ENABLED"
}
],
"primaryKeyId":3735472485
}
*/

Similarily, KEYS.KEYSET_FROM_JSON can be used to provide JSON representation of KEYSET and get serialized bytes.
For all intents and purposes of encryption and decryption with Bigquery encryption/decryption functions, we would be needing the serialized bytes.

Lets see one example of encryption and decryption with above keyset.

--Encryption Process
SELECT AEAD.ENCRYPT(
FROM_BASE64("COWSm/UNEmQKWAowdHlwZS5nb29nbGVhcGlzLm<redacted>"),
"Albert Einstein",
"Scientist") AS encrypted_bytes

--Ad6myWWr9tJJhEG9DKRXuR15FVQO3h4tFi3O/4odapCqY3JyVJSmUCBo7LJ4hPwM

--Decryption Process
SELECT AEAD.DECRYPT_STRING(
FROM_BASE64("COWSm/UNEmQKWAowdHlwZS5nb29nbGVhcGlzLm<redacted>"),
FROM_BASE64("Ad6myWWr9tJJhEG9DKRXuR15FVQO3h4tFi3O/4odapCqY3JyVJSmUCBo7LJ4hPwM"),
"Scientist") as decrypted_string;
--Albert Einstein

Scientist” is the associated key word which is used to authenticate the encrypted data and to prove that the encrypted data is not tampered with.

The generated keyset can be rotated with a new key using KEYS.KEYSET_ROTATE

SELECT KEYS.ROTATE_KEYSET(FROM_BASE64("COWSm/UNEmQKWAowdHlwZS5nb29nbGVhcGlzL<redacted>), "AEAD_AES_GCM_256");
--CMKtlcAKEmQKWAowdHlwZS5nb29nbGVhcGlzLmNvbS9nb29nbGUuY3J5cHRvLnRpbmsuQWVzR2NtS2V5EiIaIDXINLdEdJU/rsQp5ovIBv/k/5Al7/ <redacted>

SELECT KEYS.KEYSET_TO_JSON(FROM_BASE64("CMKtlcAKEmQKWAowdHlwZS5nb29nbGVhcGlzLmNvbS9nb29nbGUuY3J5cHRvLnRpbmsuQWVzR2NtS2V5EiIaIDXINLdEdJU/rsQp5ovIBv/k/5Al7/<redacted"));
{
"key": [
{
"keyData": {
"keyMaterialType": "SYMMETRIC",
"typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
"value": "<redacted>"
},
"keyId": 3735472485,
"outputPrefixType": "TINK",
"status": "ENABLED"
},
{
"keyData": {
"keyMaterialType": "SYMMETRIC",
"typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
"value": "<redacted>"
},
"keyId": 2818922178,
"outputPrefixType": "TINK",
"status": "ENABLED"
}
],
"primaryKeyId": 2818922178
}

The best part about using this encryption/decryption is its native tink library usage. The above JSON keyset information can be used to decrypt the encrypted string in native tink library.

import tink
from tink import aead, cleartext_keyset_handle
import base64

# The TINK KEY SET file is the above plain text key set ouput
TINK_KEYSET_FILE = "bq_keyset.json"

aead.register()

text = open(TINK_KEYSET_FILE, "rt").read()
keyset_handle = cleartext_keyset_handle.read(tink.JsonKeysetReader(text))

cipher = keyset_handle.primitive(aead.Aead)

# Encrypted output from the aead.encrypt function.
output = cipher.decrypt(base64.b64decode("Ad6myWWr9tJJhEG9DKRXuR15FVQO3h4tFi3O/4odapCqY3JyVJSmUCBo7LJ4hPwM".encode("utf-8")), b"Scientist")

print(output.decode("utf-8"))
# Albert Einstein.

The data which is encoded using bq encryption functions can be decrypted using the tink library in supported languages. This gives the approach a greater flexibility.

Now NEW_KEYSET function provides the keyset in serialized format which can be de-serialized to get the plain text key material , so this should only be used for testing purposes.

We will go through the WRAPPED KEYSET and KEYSET_CHAIN concept to understand how cloud KMS strengthens the process

The below command creates keyset encrypted by the cloud KMS key specified

SELECT KEYS.NEW_WRAPPED_KEYSET("gcp-kms://projects/<redacted>/locations/us/keyRings/tink_us_keyring/cryptoKeys/tink_key", "AEAD_AES_GCM_256");
--CiQA3GVtLInRIEJQ5oUHo4v0hICcEeL+vdH0WFXSWWIL+<redacted>

Now we will create a keyset chain which helps to decrypt the encrypted key on the fly. We will encrypt a sample plain text in the similar way.
The value of encrypted key needs to be passed as query parameter or part of select as below

DECLARE KEY BYTES;
SET KEY = (SELECT FROM_BASE64("CiQA3GVtLEAKCJE3Kmny5Cvg163nrtU0xLa2rvLCqA4DO0wrhqsS<redacted>"));
SELECT AEAD.ENCRYPT
(KEYS.KEYSET_CHAIN("gcp-kms://projects/<redacted>/locations/us/keyRings/tink_us_keyring/cryptoKeys/tink_key",
KEY),
"Albert Einstein",
"Scientist"
)

--ARmnidmkrodK2qVQZ0POOCU65Ci9R7jMAHyXTNUpzsum2NjoyGtC25sfLezclc7p

Similarly the decrypt function works as

DECLARE KEY BYTES;
SET KEY = (SELECT FROM_BASE64("CiQA3GVtLEAKCJE3Kmny5Cvg163nrtU0xLa2rvLCqA4DO0wrhqsS<redacted>"));
SELECT AEAD.DECRYPT_STRING
(KEYS.KEYSET_CHAIN("gcp-kms://projects/<redacted>/locations/us/keyRings/tink_us_keyring/cryptoKeys/tink_key",
KEY),
FROM_BASE64("ARmnidmkrodK2qVQZ0POOCU65Ci9R7jMAHyXTNUpzsum2NjoyGtC25sfLezclc7p"),
"Scientist"
)

--Albert Einstein

The wrapped keyset can be distributed and can only be decrypted by the user who has permission on the cloud KMS key used as KEK

Lets try to decrypt the encrypted key and see the contents of the key

echo "CiQA3GVtLEAKCJE3Kmny5Cvg163nrtU0xLa2rvLCqA4DO0wrhqs<redacted>" | base64 -d > wrapped_key

gcloud kms decrypt --key tink_key \
--keyring tink_us_keyring \
--location us \
--ciphertext-file wrapped_key \
--plaintext-file unwrapped_key

cat unwrapped_key | base64
--CILf5s8IEmQKWAowdHlwZS5nb29nbGVhcGlzLmNvbS9nb29nbGUuY3J5cHR<redacted>

Now the key is decrypted and lets see the JSON format of keyset

select KEYS.KEYSET_TO_JSON(FROM_BASE64("CILf5s8IEmQKWAowdHlwZS5nb29nbGVhcGlzLmNvbS9nb29nbGUuY3J5cHR<redacted>"));
{
"key": [
{
"keyData": {
"keyMaterialType": "SYMMETRIC",
"typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
"value": "<redacted>"
},
"keyId": 430410201,
"outputPrefixType": "TINK",
"status": "ENABLED"
},
{
"keyData": {
"keyMaterialType": "SYMMETRIC",
"typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
"value": "<redacted>"
},
"keyId": 2314841986,
"outputPrefixType": "TINK",
"status": "ENABLED"
}
],
"primaryKeyId": 2314841986
}

The access to the cloud KMS key needs to be provided diligently only to the process or group who requires access.

Similar to KEYSET_ROTATE, ROTATE_WRAPPED_KEYSET exists for rotation of the wrapped keys.

The generated wrapped key set can also be used as part of python program using tink library.

The below program is just a sample program showing the interchangeability between tink client program and BQ encryption functions. Please do not use the below as this contains cleartext handle which should only be used from testing purposes.

Prefer to use the encrypted keyset handle using gcp kms

import tink
from tink import aead, cleartext_keyset_handle
from tink.integration import gcpkms
import base64
import io
from google.cloud import kms_v1

aead.register()

PROJECT = "<>"
LOCATION = "us"
KEY_RING = "tink_us_keyring"

# Decrypting the Encrypted KEK from BQ Wrapped Keyset
text = "CiQA3GVtLEAKCJE3Kmny5Cvg163nrtU0xLa2rvLCqA4DO0wrhqs<redacted>"
client = kms_v1.KeyManagementServiceClient()
key_name = client.crypto_key_path(project=PROJECT, location=LOCATION, key_ring=KEY_RING, crypto_key="tink_key")
decrypt_response = client.decrypt(request={'name': key_name, 'ciphertext': text })
cipher = decrypt_response.plaintext

# Creating handle for reading the binary decrypted keyset
###### Only for demo purposes not for actual use
keyset_handle = cleartext_keyset_handle.read(tink.BinaryKeysetReader(cipher))
cipher = keyset_handle.primitive(aead.Aead)
data = "AYn5r4LTlCK/dUgUGljXK29zl9cAATdCF4R1qrkFPKwxXYt5cBvO69EOftwK7ODp"
output = cipher.decrypt(base64.b64decode(data.encode("utf-8")), b"Scientist")
print(output.decode("utf-8"))
# Albert Einstein

We will be continuing with the encryption portion (DLP, KMS) of dynamic security controls in the part 4 of the series.

Please connect with me on https://www.linkedin.com/in/murli-krishnan-a1319842/ for any queries.

Happy Learning.

--

--