0–100 Kms/ hour using Terraform and the Azure Modern Data Platform : Accelerating your cloud deployments
It’s all about automation nowadays.
How can you save time and dollars while setting up your data platform using a tool that can be used across the different clouds?
Welcome to Terraform.
If you are a bit late to the IaC (Infrastructure as Code) game the Wikipedia definition is quite good for you to get started
Of course I am not going to provision an entire data centers rather a modern data platform architecture that I implemented for a customer. As the definition states IaC involves using code to create infrastructure.
You can also read more about Terraform on their website here. I got a last minute request from a customer to help with Terraform and thankfully I had internet on my flight to grasp the concepts quickly. Okay lets get started.
a) Login to the Azure portal
b) Once successfully logged in open Azure shell by going to https://shell.azure.com (you can also launch it directly from the portal but going to this URL opens up a nice full screen view which I like) in a new tab.
c) You should see the below screen:
d) Select the directory you want to work in.
This is how the screen will look once you are connected
Let’s run this command:
az account show --output jsonc
It will show the account details for the subscription I selected initially and because we have specified the ‘jsonc’ argument it will display it in a json format.
Lets get started with Terraform!
Type ‘Terraform’ in the cloud shell and hit enter — it will show the various options available.
Now let us create do the following:
- Create a directory to store our terraform file : I am going to call it azurebigdataplatform
- change our working folder to the above directory : I am changing the directory to this newly created directory
- create a blank terraform file (*.tf) : I am calling it bigdata
Modern Data Platform: The below is the Modern Data Platform architecture I want to deploy
Therefore we will be deploying the following through our Terraform code file. (I am presuming you already know what each of the below services do)
- Data Factory
- Data Lake Gen 2
- Azure Databricks
Copy the below code onto your clipboard
- Start vscode in the Cloud Shell
2. Click on the bigdata.tf file in the explorer pane
3. Click in the right hand pane to get focus
4. Paste in the contents of the clipboard
5. Save the file
6. Close the file
Now, let us run through the terraform workflow.
At the terminal run
- terraform init
output of running the command:
- terraform plan
output of running the command:
- terraform apply
You will see the same output as the terraform plan command, but will also be prompted for confirmation that you want to apply those changes. Say ‘yes’ to confirm that we want the changes to be applied.
Once the changes have been applied:
We will see the azurebigdatadev resource group has been created and the three services are deployed and running
- Databricks-dev : Development workspace for Databricks
- dalanalyticslandingzone : ADLS Gen 2
- dvdatafactoryingestion: Data factory V2
Now the next time I need to deploy these artefacts its pretty simple for me to deploy the same terraform script and voila! the artefacts will be deployed and the infrastructure will be up in a jiffy without the need to remember specifics of the services to be setup and also save time to ensure the modern data platform infrastructure is available for use quickly.
Note: The opinions expressed herein are mine alone and do not represent the opinions of my employer.