Add Customer-managed Key to Git-managed Data Factory via Terraform

Gerrit Stapper
NEW IT Engineering
Published in
7 min readMar 23

--

Feature image showing a ring with 3 keys put on a wall with a nail
Feature image by: by Silas Köhler on Unsplash

I recently was requested to configure our Data Factory instance to use a customer-managed key for its encryption. This was part of the compliance and security policies that we have in place in our Azure tenant in order to enforce better infrastructure through static code analysis.

The idea of customer-managed keys (CMK) as opposed to Microsoft-managed keys is that you own their lifecycle and that their values cannot ever be available to others. Microsoft claims they have no insight into the values of the keys, i.e. the private part of the key and thus cannot leverage that to look into your data. So CMKs are used to encrypt your data — not your infrastructure. That is another aspect.

My initial concern and thus the trigger to write this blog post was this line from the Microsoft documentation: “A customer-managed key can only be configured on an empty data Factory.” We are using a git-managed factory and thus I was wondering that empty means in this context: Does it mean I cannot have any resources associated with the factory instance at all, i.e. whether in the “git mode” or the “live mode” or was it only looking at the resources in the live model? Answer: I am 90% certain it’s about the live mode and even if it’s not, the Git configuration does it no harm.

Despite that, the CMK configuration and rotation of Data Factory and other resources turned out to be a bit more complicated than expected, so here is a first blog post trying to help you set it up.

The basic resources required

From the Microsoft documentation we can take the following key information to consider what resources we need to manage via Terraform:

  1. “Azure Key Vault is required to store customer-managed keys” → we need a Key Vault Instance and a Key inside it
  2. Enable Soft Delete and Do Not Purge on Azure Key Vault” → we need to configure purge protection of soft deletion retention on that Key Vault instance
  3. […] ensure the user-assigned managed identity (UA-MI) has […] permission to Key Vault” → we need a user-assigned identity that is “in the same Azure Active Directory (Azure AD) tenant and in the same region” as our Key Vault instance
  4. grant data factory following permissions: Get, Unwrap Key, and Wrap Key” → we need to grant the user-assigned identity those permissions either via a role assignment or via an access policy inside Key Vault
  5. Only RSA keys are supported with Data Factory encryption” → the key must be of type RSA
  6. “[Add] the URI for customer-managed key […] [to your data factory instance]” → we need an output of the Key resource and add it to our Data Factory Instance

One side note here: We didn’t need to work with a VNet (yet), but there seems to be a restriction on using Data Factory + CMK + VNet: “This approach does not work with managed virtual network enabled factories. Please consider the alternative route, if you want encrypt such factories.

Here is an overview of what we are aiming for: First we need to get Data Factory to use the user-assigned identity, which as access via the access policy (1) and then let Data Factory act upon that identity to read the CMK from key vault (2).

Image showing the architecture of Azure resources required to configure CMK with data factory. As a first action path, data factory must get associated with a user-assigned identity. This identity is granted access on keys inside Azure Key Vault via an Access Policy. This allows Data Factory to get access to the CMK in Key Vault by acting throught the identity. All resources are within the same resource group
Architecture overview of a CMK configuration for Data Factory

Setting up Terraform

The entire code is available in Github: https://github.com/gersta/az-data-factory-cmk-tf-demo. Therefore, I will only highlight those aspects that I find most important and tricky to handle and try to reference the numbered aspects from above as some kind of tick-list to fullfil.

The following code snippets will be excerpts, i.e. they won’t be complete, but instead merely highlight the just discussed aspects.

As with all Terraform code you ever write, please use a remote backend to collaborate with others and not Git! See my previous post and the Deploy your Azure Data Factory through Terraform and this great guide by Yevgeniy Brikman.

Setting Location configuration

The first basic step to setup the Terraform code, is to ensure all resources are in the same region (number 3) I always set the location on the resource group and then read that location information whenever I need it in other resources

locals {
default_name = "data-factory-cmk-demo"
}

resource "azurerm_resource_group" "this" {
location = "westeurope"
name = "rg-${local.default_name}"
}

Setting Purge Protection and Soft Deletion

Purge protection protects the Key Vault instance from being ultimately deleted on the spot, but instead puts it into a soft deletion mode after which is will be deleted. Regardless of the respective permissions, purge protection is stronger !

With the azurerm provider in version 3.47.0 purge protection is disabled by default with all Key Vault instances, however the soft deletion retention period is set anyways. When using purge protection, soft deletion is required (number 2).

resource "azurerm_key_vault" "this" {
location = azurerm_resource_group.this.location
name = "kv-${local.default_name}"
resource_group_name = azurerm_resource_group.this.name
sku_name = "standard"
tenant_id = data.azuread_client_config.this.tenant_id
purge_protection_enabled = true
soft_delete_retention_days = 90
}

Receiving Tenant information

Our approach to granting the required permissions to the identity (number 4) is to use access policies at the moment. This requires Azure tenant information for both the Key Vault instance as well as the access policies.

Only in this example I am using theazuread_client_config data source to obtain this tenant information. In our actual setup we use azuread_client_config instead. Somehow only azuread works to receive a proper object_id when using a personal user while with service principles that only works with azurerm for some reason. Further information here: https://github.com/hashicorp/terraform-provider-azurerm/issues/7787

data "azuread_client_config" "this" {}

resource "azurerm_key_vault" "this" {
location = azurerm_resource_group.this.location
name = "kv-${local.default_name}"
resource_group_name = azurerm_resource_group.this.name
sku_name = "standard"
tenant_id = data.azuread_client_config.this.tenant_id <-- here
purge_protection_enabled = true
soft_delete_retention_days = 90
}

resource "azurerm_key_vault_access_policy" "client" {
key_vault_id = azurerm_key_vault.this.id
object_id = data.azuread_client_config.this.object_id
tenant_id = data.azuread_client_config.this.tenant_id <-- here

[...]
}

Managing Deployment (and destroy) ordering and race conditions

The order of the resource creation (and deletion) is crucial to setup CMK. The key must be created before any services can use it (obviously). That means that Terraform can only start creating the Data Factory instance once the key is created. But, it must also make sure that the respective access policies for the identity are present too.

We get the Factory after Key sequence for free as the two resources have an explicit dependency:

resource "azurerm_data_factory" "this" {
name = "df-${local.default_name}"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name

[...]

customer_managed_key_id = azurerm_key_vault_key.this.id <-- here
customer_managed_key_identity_id = azurerm_user_assigned_identity.this.id
}

However, there is not explicit depencency of the Key and its associated access policies and thus eventually also no explicit dependency of Data Factory on that access policy. To enforce that, we can make the Key depend on all access policies that we create for the Key Vault instance (we have multiple as the client that runs Terraform must also have access to Terraform. More on that below):

resource "azurerm_key_vault_key" "this" {
key_opts = ["wrapKey", "unwrapKey"]
key_type = "RSA"
key_size = 4096
key_vault_id = azurerm_key_vault.this.id
name = "customer-managed-key"

depends_on = [ <-- here
azurerm_key_vault_access_policy.client,
azurerm_key_vault_access_policy.uai
]
}

Granting the client access on Key Vault

As mentioned above, we will need a dedicated access policy for the client that runs Terraform (if that client does not have a role assignment granting those permissions already). I am only highlighting this as with azurerm version 3.45.0 the key permission GetRotationPolicy is mandatory whether you actively use it or not. This is what the Terraform documentation gives us here: “To use this resource, your client should have RBAC roles with permissions like Key Vault Crypto Officer or Key Vault Administrator or an assigned Key Vault Access Policy with permissions Create,Delete,Get,Purge,Recover,Update and GetRotationPolicy for keys without Rotation Policy. Include SetRotationPolicy for keys with Rotation Policy.”

resource "azurerm_key_vault_access_policy" "client" {
key_vault_id = azurerm_key_vault.this.id
object_id = data.azuread_client_config.this.object_id
tenant_id = data.azuread_client_config.this.tenant_id

key_permissions = [
"Get",
"List",
"Create",
"Update",
"Delete",
"Recover",
"GetRotationPolicy"
]
}

Configuring CMK on Data Factory

Finally, we come to the actual thing we wanted to discuss: Setting up CMK configuration with Data Factory. The important aspects here are to associated the Data Factory instance with the user-assigned identity to obtain the required permissions (number 3 and 4) via the access policy for that identity. This configuration requires and identity block inside Data Factory which assigns the resource ID, i.e. the Azure Resource Manager (ARM) URI, not the AD client or object ID.

resource "azurerm_data_factory" "this" {
name = "df-${local.default_name}"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name

[...]

identity { <-- here
type = "UserAssigned"
identity_ids = [azurerm_user_assigned_identity.this.id]
}

[...]
}

And then as a second step, we need to set the CMK for Data Factory to use for its encryption and explicitly tell it what identity to use in order to wrap and unwrap that key. This again requires the ARM URI of the identity, not the AD client or object ID (number 6).

resource "azurerm_data_factory" "this" {
name = "df-${local.default_name}"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name

[...]

identity {
type = "UserAssigned"
identity_ids = [azurerm_user_assigned_identity.this.id]
}

customer_managed_key_id = azurerm_key_vault_key.this.id
customer_managed_key_identity_id = azurerm_user_assigned_identity.this.id
}

Conclusion

Our approach to initially configure CMK with data factory was that we saved all our works in the associated git repository and then deleted the Data Factory instance. Afterwards, the re-created it with the above discussed modified configuration.

In regards to Data Factory: Make sure to to publish your current code once the Terraform deployment is done as the Git configuration via Terraform does only load the code into the UI, but not publish it to the live mode. Thus, with the Terraform deployment alone, you won’t be able to run any code in Data Factory.

Finally, this setup is not complete when looking at key rotations as another layer of security. We will discuss that in another blog post, so stay tuned!

As said above, you can find the code here: https://github.com/gersta/az-data-factory-cmk-tf-demo with the tag blog-post-1.

--

--

Gerrit Stapper
NEW IT Engineering

Software Engineer, Interested in Software Quality and Teamwork, Cyclist