Theodore Siu
Sep 16 · 4 min read

Authors: Theodore Siu, Sameer Abhyankar

DataFlow accepts and displays runtime parameters without differentiation on whether the parameters are sensitive or not. We rely on GCP Key Management Service (KMS) to encrypt/decrypt such fields to add an extra layer of security .

We noticed that for Google Provided Dataflow Templates, namely the JDBCToBigQuery Template, some of the runtime parameters passed in were sensitive fields. As a result, these fields were available on the Dataflow Jobs UI page and also through Dataflow.jobs.get API calls. See Figure 1 for an example of exposed fields in the UI. Unfortunately at this time, Dataflow does not have a built-in way to hide secrets (passwords, etc.) which get passed via pipeline options.

Passing in sensitive parameters and secrets as Dataflow options exposes them on the UI

Quick possible workarounds to this issue included the following:

  1. Embedding the sensitive fields within the template itself. Unfortunately when compiling the template, the fields were still found within the template file despite not showing up as parameters. Also hardcoding values makes the template lose its generalizability.
  2. Using a configuration blob file located in GCS to store and read from for sensitive fields. This required some tweaking of the code and the sensitive fields still lived elsewhere in GCP.

A final workaround that we ultimately agreed on was using the Google Cloud Key Management Service to encrypt our fields and pass in encrypted fields to be decrypted during the Dataflow job. In this way, we were not passing around secrets and secrets were not available in another GCP resource. We included backwards compatibility in the template for users who wish to continue use the template in its original form (although we strongly advise against it!).

Getting started

Prerequisites: Before starting work, please ensure that Google Cloud Key Management Service (KMS) API has been enabled in your project. Also ensure that you as the user have the cloudkms.admin IAM role or at least the cloudkms.editor role in order to create your own keys/key rings. Note that the cloudkms.encrypter/decrypter role is separate from key creation and also needs to be added. We will do that below.

In order to use GCP KMS with the JDBCToBigQuery Template, one first must create a KMS key ring and symmetric key. This can be done on the GCP console under the Security >> Cryptographic Keys tab. Be sure to create both a ring and also a symmetric key belonging to the ring! Once your key has been created:

Figure 2: Add cryptoKeyEncrypterDecrypter role to your Dataflow service account and to yourself. Note that Cloud KMS Admin does not have encrypter/decrypter privileges
  1. Ensure that you have the correct IAM roles for the key to encrypt and decrypt. The two accounts to update include yourself and also the service account which is running the Dataflow Job. Add the role roles/cloudkms.cryptoKeyEncrypterDecrypter to both of those accounts. This again can be done in the UI. See Figure 2 for how to do this on the GCP KMS UI.
  2. In order for KMS to encrypt your sensitive parameters they must be first converted them into base64-encoded strings. To convert your string open up Python3 and run the following command
import base64
base64.b64encode(b'mystringthatIwanttoencode')

If you are using Python2 you can run the following command.

'mystringthatIwanttoencode'.encode('base64').rstrip('\n')

Note the right strip of the \n or the new line character in the Python2 code. The new line is automatically appended in the encoding and is not needed so we strip it away.

3. Use the KMS API to encrypt your sensitive parameters. Enter in your key name which should be in the format projects/*/locations/*/keyRings/*/cryptoKeys/* and under the plaintext field enter the parameter that you wish to encrypt that has already been converted into a base64-encoded string (see step 2). For the JDBCToBigQuery Template you will need to encrypt username , password and connectionURL. The encrypted values are returned as base64-encoded strings. See Figure 3 for more details on the API. Note that one can also run a CURL command in the below form if they do not wish to use the API UI:

curl -s -X POST "https://cloudkms.googleapis.com/v1/projects/myproj/locations/mylocation/keyRings/keyring1/cryptoKeys/mykey:encrypt"  -d "{\"plaintext\":\"PasteBase64EncodedString\"}"  -H "Authorization:Bearer $(gcloud auth application-default print-access-token)"  -H "Content-Type:application/json"

4. When running the JDBCToBigQueryTemplate set your KMSEncryptionKey as the same symmetric key in the form projects/*/locations/*/keyRings/*/cryptoKeys/*that you used for encryption. For username , password and connectionURL instead of entering them normally, use your encrypted values instead. Run your Dataflow job!

Figure 3: Use the GCP KMS API to encrypt your sensitive parameters. Pass in your key name under the name parameter and pass in the to be encrypted string parameter under the plaintext parameter in the request body.

How it works and future work

We created a KMSEncryptedNestedValueProvider which takes in two parameters- the normal field such as username ValueProvider and the KMS crypto key ValueProvider. If the KMS crypto key is null, the normal field is assumed to be unencrypted and used as the actual value. If the KMS crypto key is present we use the key to decrypt the normal field and use the KMS decrypted value instead. This logic is applied to username , password and connectionURL fields in the JDBCToBigQuery Template.

When using sensitive parameters in Dataflow it is important to keep security in mind. Using GCP KMS adds an extra layer of protection from inadvertent glances which leads to accidentally exposing credentials. Feel free to add the same logic to your other Dataflow Templates and Beam jobs!

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Thanks to Sameer Abhyankar

Theodore Siu

Written by

Google Cloud Data Engineer

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade