Deterministic encryption/decryption with Cloud KMS using DLP Plugin in Cloud Data Fusion
Goal : To encrypt and decrypt the data in the datafusion pipeline, DLP plugin can be used .This post provides a step by step guide to do deterministic encryption and decryption in data fusion with cloud KMS .
Steps :
1. KMS Keys
To manage encryption keys and perform cryptographic operations with those keys.
1.1 Go to Security → Key Management. (Enable the Api if not enabled.)
You can follow the steps here or continue with the steps below.
1.2 Creating symmetric encryption keys , Create keyring :
Required role permission : cloudkms.keyRings.create refer
Create a key ring from key management. Mention name of key ring and location.
1.3 After Creating a keyring , Create a key , and mention other details as needed.
1.4 Copy the resource name of the key, we will use it in next steps .
projects/xxxxxxx/locations/us-west1/keyRings/mytestkeyring/cryptoKeys/testkey
1.5 Create a wrapped key ( follow this link or continue here) :
In cloud shell with user having required permissions , execute below commands to get the wrapped key :
a.
openssl rand -out "./aes_key.bin" 32
b.
base64 -i ./aes_key.bin
It is BASE64_ENCODED_AES_KEY ,You get an output similar to the following: uEDo6/yKx+zCg2cZ1DBwpwvzMVNk/c+jWs7OwpkMc/s=
c. To wrap the AES key, use curl to send the following request to the Cloud KMS API
Replace the key and BASE64_ENCODED_AES_KEY returned from steps 1.4 and b
curl "https://cloudkms.googleapis.com/v1/projects/xxxxxx/locations/us-west1/keyRings/mytestkeyring/cryptoKeys/testkey:encrypt" \
--request "POST" \
--header "Authorization:Bearer $(gcloud auth application-default print-access-token)" \
--header "content-type: application/json" \
--data "{\"plaintext\": \"uEDo6/yKx+zCg2cZ1DBwpwvzMVNk/c+jWs7OwpkMc/s=\"}"
Take note of the value of ciphertext in the response that you get. That is your wrapped key. We will be using it in the next steps .
2 DLP Template
2.1 To Create a DLP template go to Security → Data Loss Prevention. Enable the Api if not enabled.
2.2 Create the template from the configuration Tab .You can follow the Steps here to create the template.
2.3 Once template is created copy the path like :
projects/xxxxxxx/locations/us-west1/inspectTemplates/template1
3 Data fusion pipeline
Enable the data fusion api . Once done ,then create a data fusion instance using Data Fusion → Instance
Grant permission to Cloud Data Fusion to use the selected Dataproc Service Account.Click create .
It might take around 20 mins to spin up .
Once instance is up ,go to HUB from the top menu and search for “ data loss “, deploy the plugin “data loss prevention”
From the left menu select studio from the Pipeline .We will be creating the below pipeline . For simplicity and understanding, we are going to encrypt and decrypt in the same pipeline .
- Source : Big Query
Select Big Query from the source , in the properties browse the dataset and table .Click on validate after filling required information. You can also check the output schema .
You can see the DLP related transformation in “transform” .
2. For encryption — select Google DLP redact from transform .
Connect a big query source and DLP redact . The columns of the Big query source will be available in DLP redact .
- In the properties section , select “Yes” to Use custom template*
- And paste the template path from step 2.1 , Mention the same resource location .
- In the Matching choose deterministic encryption and select the input field where you want to have encryption .
- Use Crypto key Type as “KMS Wrapped key” .
- Mention KMS Resource ID and Wrapped Key from steps 1.4 ,1.5.c respectively .
- Use the surrogate key of your choice.
3. For decryption — Use DLP decrypt from the transform , and mention all the details same as encryption .
4. Sink : Select Big query from the sink and browse the sink table . make sure it has a matching schema same as output of DLP decrypt .
Make sure to validate each node. Additionally Grant required permission for DLP and KMS related roles .
[While using Deterministic Encryption with KMS Wrapped Key, the Cloud KMS CryptoKey Encrypter/Decrypter role must be granted to Cloud Data Loss Prevention Service Agent]
Go to the preview section and Run.
Once Succeeded,use Preview data to check the sample results .Where column a1 is encrypted. It is the output of Google DLP redact.
After decryption below is the preview data .
You can add more transformation in between the source and sink .Once done, deploy the pipeline using Deploy.