How to Access Key Vaults from Azure Batch Jobs

Gergely Soti
datamindedbe
Published in
3 min readApr 2, 2021

The cheapest and simplest way of running computational jobs on Azure is by using Azure Batch. This service enables you to launch managed Virtual Machines on demand only for the duration of the computational jobs. We use this to run Spark jobs, but it can run anything!

The only problem with Azure Batch is that the processes running in it are isolated from the rest of Azure. For example, if you need to read from Blob storage, you are not authenticated by default. Usually we use a Key Vault to store access credentials to different systems, so ultimately the only thing a process needs is access to the relevant Key Vault! The standard way of doing that is fairly simple: register an App in Azure AD, and assign it a role to access the Key Vault. Your batch process then authenticates to Azure AD, assumes the identity of the App, and with that it can access the Key Vault. Sounds easy, right?

In summary, we need to do the following steps:

  • create a certificate
  • create an App Registration (in Azure AD) and upload the certificate you just made
  • upload the same certificate to Azure Batch, and attach it to the pool you are using
  • in the application running on Batch, read the certificate, and use it to authenticate to Key Vault

Certificates

The computational jobs running on Batch will need to use a certificate to prove their identity to Azure AD, so they can assume the identity of the App you registered.

So we go ahead and create an App Registration in Azure AD. Go to the “Certificates and Secrets” section, and try to upload a certificate. Note that this accepts the following formats: cer, pem and crt.

So we create a certificate. For that we will need a private key first:

openssl genrsa -out server.pem 2048

From this, we create a certificate signing request

openssl req -new -key server.pem -out server.csr

Now, we are ready to actually create the certificate

openssl x509 -req -days 365 -in server.csr -signkey server.pem -out server.crt

This server.crt you can upload to the Azure portal. Note the resulting thumbprint.

So, we have an App Registration with a related certificate. Now, from our script/program running on Azure Batch, we use this same certificate to authenticate to AD. The only thing remaining is to attach this certificate to the Azure Batch pools.

So we navigate to Azure Batch, and click on Certificates. We try to add a certificate, and realize that it only accepts certificates in pfx and cer formats. So, we convert our certificate to pfx (note that you have to specify a password):

openssl pkcs12 -export -out certificate.pfx -inkey server.pem -in server.crt

So we upload this certificate, providing the thumbprint we got when we uploaded the certificate to AD. Next, we assign this certificate to the Batch pool we are using to run our jobs. Click on Pools, select the pool you want, and click Certificates. Then add the certificate you just uploaded to Batch.

Running a Python script on Batch

We run our computational jobs in docker on Azure Batch. In that setup, the certificates attached to the pool will be available in the folder defined by an environmental variable AZ_BATCH_CERTIFICATES_DIR. The actual certificates will be in files called sha1-<CERTIFICATE_THUMBPRINT>.pfx, while the password is stored in a file called sha1-<CERTIFICATE_THUMBPRINT>.pfx.pw

If you are using the Azure SDK for python, unfortunately the pfx format is not compatible with the SDK, so we need to convert it:

cert_thumbprint=<YOUR_CERT_THUMBPRINT>
in_cert=${AZ_BATCH_CERTIFICATES_DIR}/sha1-$cert_thumbprint.pfx
in_cert_pw=${in_cert}.pw
out_cert=${AZ_BATCH_CERTIFICATES_DIR}/cert.pem
openssl pkcs12 -in $in_cert -out $out_cert -nokeys -nodes -password file:$in_cert_pwopenssl pkcs12 -in $in_cert -nocerts -nodes -password file:$in_cert_pw | openssl rsa -out
${AZ_BATCH_CERTIFICATES_DIR}/cert.key
cat ${AZ_BATCH_CERTIFICATES_DIR}/cert.key >> $out_cert

With these steps, we have converted the pfx certificate to a pem-style certificate, which is usable with python

certificate_credential = CertificateCredential(
tenant_id=os.environ["AZURE_TENANT_ID"],
client_id=os.environ["CLIENT_ID"],
certificate_path=project_root / "cert.pem"
)

You can use this credential object with a Key Vault client, provided that you allowed the AD App to access the Key Vault:

secret_client = SecretClient(vault_url=<KEY_VAULT_URL>,       certificate_credential)
print(self.secret_client.get_secret("top-secret").value)

Now, it is just a matter of storing the relevant secrets in the Key Vault.

I work at Data Minded, an independent data engineering and data analytics consultancy based in Leuven, Belgium. We built and ran Data Platforms on top of Azure Batch and processed massive amounts of data. If you need help with your Data Platform, contact us!

--

--