Protect PII by encrypting using Fernet in Azure Synapse Spark
Published in
2 min readOct 9, 2021
Protect PII by encrypting using Fernet in Azure Synapse Spark
Using fernet to encrypt key — symmetric encryption
Prerequisite
- Azure Account
- Azure Storage
- Azure Synapse Analytics workspace with Spark
Use Case
- Use encryption to encrypt PII or other sensitive data
- Data should be stored encrypted
- Only folks who have access to key can decrypt
- encrypt column level so only necessary columns can be encrypted and other’s are available for reporting
- here is the open source encryption project — https://cryptography.io/en/latest/fernet/
Code
- First create a spark cluster
- install cryptography library
- Create environment.yaml file
name: stats
dependencies:
- numpy
- pandas
- cryptography
- wait for the package to install
- Create a new notebook
- Add the package yml file
from cryptography.fernet import Fernet
# >>> Put this somewhere safe!
key = Fernet.generate_key()
- print and see
f = Fernet(key)
token = f.encrypt(b"A really secret message. Not for prying eyes.")
print(token)
print(f.decrypt(token))
- Create the UDF for encrypt and decrypt
# Define Encrypt User Defined Function
def encrypt_val(clear_text,MASTER_KEY):
from cryptography.fernet import Fernet
f = Fernet(MASTER_KEY)
clear_text_b=bytes(clear_text, 'utf-8')
cipher_text = f.encrypt(clear_text_b)
cipher_text = str(cipher_text.decode('ascii'))
return cipher_text# Define decrypt user defined function
def decrypt_val(cipher_text,MASTER_KEY):
from cryptography.fernet import Fernet
f = Fernet(MASTER_KEY)
clear_val=f.decrypt(cipher_text.encode()).decode()
return clear_val
- read data
df = spark.read.load('abfss://containername@storagename.dfs.core.windows.net/titanic/Titanic.csv', format='csv'
## If header exists uncomment line below
, header=True
)print(df)
- encrypt data
from pyspark.sql.functions import udf, lit, md5
from pyspark.sql.types import StringType# Register UDF's
encrypt = udf(encrypt_val, StringType())
decrypt = udf(decrypt_val, StringType())# Fetch key from secrets
# encryptionKey = dbutils.preview.secret.get(scope = "encrypt", key = "fernetkey")
encryptionKey = key# Encrypt the data
#df = spark.table("Test_Encryption")
encrypted = df.withColumn("Name", encrypt("Name",lit(encryptionKey)))
encrypted.head()
- decrypt test
decrypted = encrypted.withColumn("Name", decrypt("Name",lit(encryptionKey)))
decrypted.head()
Originally published at https://github.com.