Building End-to-End MLOps Pipelines for Sentiment Analysis on Azure with Terraform, Kubeflow v2, Mlflow, and Seldon: Part 2

Rachit Ahuja
11 min readJun 5, 2023

--

Part 1: Introduction and Architecture planning

Part 2: Developing workflows for infrastructure deployment via CI-CD

Part 3: Setup MLflow on AKS

Part 4: Setup Kubeflow and Seldon on AKS

Part 5: End-to-End training pipeline and Inference Deployment using Seldon

This section of the blog post focuses on the process of creating infrastructure from scratch in a reusable manner. The objective is to establish a centralised code repository where code can be pushed, triggering an Azure pipeline that provisions the infrastructure based on the code changes. It is assumed that the user has already set up an account in the Azure portal and Azure DevOps, although assistance for setting up these accounts can be found at this link: https://learn.microsoft.com/en-us/azure/devops/get-started/?view=azure-devops.
The complete code for this implementation is available Github: https://github.com/rahuja23/Blog-Post-Infrastructure/tree/main .

1. Infrastructure as Code (IaC)

Infrastructure as code (IaC) is a rapidly emerging trend in the industry, gaining significant momentum. It entails managing and provisioning infrastructure using code rather than relying on manual processes. IaC represents a crucial practice within the DevOps realm and is commonly employed in conjunction with continuous delivery practices. Adopting IaC offers numerous advantages, including increased speed, enhanced reliability, and improved scalability. However, one of the primary reasons for implementing IaC in an industrial setting is to mitigate errors and discrepancies that may arise during the provisioning or re-provisioning of infrastructure. By leveraging IaC, organisations can minimize manual intervention, reduce the likelihood of human errors, and ensure consistent and standardised infrastructure deployments.

1.1 Azure DevOps Setup

In this section we will setup our Azure DevOps organisation and create a repository for our infrastructure code:

  • First step is to setup an organisation in AzureDevOps (here). In our case we will call it “ Blog-Post”.
  • Second step is to create a project inside this organisation. In our case we will call the project “Blog-Post-Infra”.
  • Now we will have to connect our Azure DevOps account with the account we have in Azure Portal. This is done by linking the Active Directory created in our Azure Portal to Azure DevOps Organisation that we just created. We can do this by clicking on Organisation settings -> Azure Active Directory-> Connect directory.
  • Now once we have connected everything we will create a repository where we will push our infrastructure code. We will name this repository “Infrastructure”. This will be the central repository for all our infrastructure we will provision.

Now we have an Azure DevOps organization with a project and a repository to contain our infrastructure code. Now what we need to setup is an authorized method for Azure DevOps to provision resources on Azure Portal. This is specifically needed because the code that provisions the infrastructure might be in a repository in Azure DevOps but the actual infrastructure won’t be provisioned on Azure DevOps itself. Since we don’t want to provision the infrastructure locally but we want to have Azure Pipelines create it for us, the best practice to do this is using Azure Service Connections. To setup service connections we would go to: Project Settings → Service connections → New service connection. For this implementation we named our service connection “terraform-basic-testing-azure-connection”. This is the service connection that our azure pipeline will use when using the terraform code in our repository to actually provision infrastructure on our behalf.

  • The last step to complete our initial setup is to somehow manage to install terraform for our azure’s CI-CD pipelines to use. For this purpose we will install terraform as an Azure Extension. To do this we go to: Organization Settings → Extensions → Browse marketplace. For this blog-post the extension we installed is by Microsoft DevLabs.

1.2 Infrastructure

In this section we will start implementing the actual code for our infrastructure. To make the implementation as clean as possible we will initially create terraform modules for each infrastructure component that we want to provision and inside our environment directory we would have the environment specific terraform files where we will call these modules and specify the configuration with which we want our infrastructure to be provisioned. The final structure of our directory will look as follows:

Infrastructure/
├─ .pipelines/
│ ├─ azure-pipeline.yml
├─ Base_Infra/
│ ├─ az-remote-backend-main.tf
│ ├─ az-remote-backend-output.tf
│ ├─ az-remote-backend-variables.tf
├─ Dev/
│ ├─ backend.tf
│ ├─ main.tf
│ ├─ outputs.tf
│ ├─ providers.tf
│ ├─ terraform.tfvars
│ ├─ variables.tf
Infrastructure_modules/
├─ kubernetes-cluster/
│ ├─ variables.tf
│ ├─ main.tf
├─ container-registry/
│ ├─ main.tf
│ ├─ variables.tf
├─ resource-group/
│ ├─ main.tf
│ ├─ variables.tf

The sub-directories inside our main repository “Infrastructure” are as follow:

  • .pipelines: This directory will contain our azure pipeline which we will use to provision our infrastructure.
  • Base_Infra: This directory contains the base infrastructure over which additional infrastructure would be provisioned. The terraform code inside will provision an Azure Resource group which will be used to link our whole infrastructure and a Storage account inside the resource group where we would save all our terraform state files. The provision of this base infrastructure can also be automated but in specifically our implementation we will provision base infra manually.
  • Dev: This directory will contain the terraform files which would be provisioned via our Azure pipeline. This directory will contain environment specific configuration which would only apply for our dev environment. In future if we want to extend to multi-environment infrastructure (e.g staging) then we would have an additional directory “Staging” with terraform files containing configurations for this specific environment.
  • Infrastructure_modules: This directory contains terraform files for infrastructure components as modules.

We will start with the “Base_Infra” directory which contains the code for the base infrastructure i.e. a resource group and a storage to save our terraform state files. Again, for the purpose of this blogpost we have built the base infrastructure manually i.e. doing terraform init, terraform plan and terraform apply locally and not using CI-CD pipelines to do it.

Base_Infra/az-remote-backend-main.tfprovider "azurerm" {
features {}
}

terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "2.78.0"
}
}
}
# Generate a random storage name
resource "random_string" "tf-name" {
length = 8
upper = false
number = true
lower = true
special = false
}
# Create a Resource Group for the Terraform State File
resource "azurerm_resource_group" "state-rg" {
name = "${lower(var.company)}-tfstate-rg"
location = var.location

lifecycle {
prevent_destroy = false
}
tags = {
environment = var.environment
}
}
# Create a Storage Account for the Terraform State File
resource "azurerm_storage_account" "state-sta" {
depends_on = [azurerm_resource_group.state-rg]
name = "${lower(var.company)}tf${random_string.tf-name.result}"
resource_group_name = azurerm_resource_group.state-rg.name
location = azurerm_resource_group.state-rg.location
account_kind = "StorageV2"
account_tier = "Standard"
access_tier = "Hot"
account_replication_type = "ZRS"
enable_https_traffic_only = true

lifecycle {
prevent_destroy = true
}
tags = {
environment = var.environment
}
}
# Create a Storage Container for the Core State File
resource "azurerm_storage_container" "core-container" {
depends_on = [azurerm_storage_account.state-sta]
name = "core-tfstate"
storage_account_name = azurerm_storage_account.state-sta.name
}

The variables used for this configuration can be stored in “variables.tf” file.

Base_Infra/az-remote-backend-variables.tf
____________________________________________________________________
# company
variable "company" {
type = string
default = "blogpost"
description = "This variable defines the name of the company"
}
# environment
variable "environment" {
type = string
default = "dev"
description = "This variable defines the environment to be built"
}
# azure region
variable "location" {
type = string
description = "Azure region where resources will be created"
default = "West Europe"
}
Base_Infra/az-remote-backend-output.tf
____________________________________________________________________
output "terraform_state_resource_group_name" {
value = azurerm_resource_group.state-rg.name
}
output "terraform_state_storage_account" {
value = azurerm_storage_account.state-sta.name
}
output "terraform_state_storage_container_core" {
value = azurerm_storage_container.core-container.name
}

Now we will go through the infrastructure modules i.e. the sub-directories inside directory “Infrastructure modules” specifically with “kubernetes-cluster” subdirectory.

Infrastructure_modules/kubernetes-cluster/main.tf
____________________________________________________________________
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "2.78.0"
}
}
}
resource "azurerm_kubernetes_cluster" "aks" {
name = var.cluster_name
kubernetes_version = var.kubernetes_version
location = var.location
resource_group_name = var.resource_group_name
dns_prefix = var.cluster_name
node_resource_group = var.node_resource_group

default_node_pool {
name = "system"
node_count = var.system_node_count
vm_size = "Standard_DS2_v2"
type = "VirtualMachineScaleSets"
availability_zones = [1, 2, 3]
enable_auto_scaling = false
}

identity {
type = "SystemAssigned"
}

network_profile {
load_balancer_sku = "Standard"
network_plugin = "kubenet" # azure (CNI)
}

}
____________________________________________________________________
Infrastructure_modules/kubernetes-cluster/variables.tf
____________________________________________________________________
variable "resource_group_name" {
type = string
description = "RG name in Azure"
}

variable "location" {
type = string
description = "Resources location in Azure"
}

variable "cluster_name" {
type = string
description = "AKS name in Azure"
}

variable "kubernetes_version" {
type = string
description = "Kubernetes version"
}

variable "system_node_count" {
type = number
description = "Number of AKS worker nodes"
}

variable "node_resource_group" {
type = string
description = "RG name for cluster resources in Azure"
}
____________________________________________________________________
Infrastructure_modules/container-registry/main.tf
____________________________________________________________________
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "2.78.0"
}
}
}
resource "azurerm_container_registry" "acr" {
name = var.name
resource_group_name = var.resource_group_name
location = var.location
sku = "Premium"
admin_enabled = false
georeplications = [
{
location = "westeurope"
zone_redundancy_enabled = true
tags = {}
}]
}
____________________________________________________________________
Infrastructure_modules/container-registry/variables.tf
____________________________________________________________________
variable "resource_group_name" {
type = string
description = "RG name in Azure"
}

variable "location" {
type = string
description = "Resources location in Azure"
}
variable "name" {
type = string
description = "Name for container registry"
}

Currently, we only require these two specific infrastructure modules. Moving forward, we will explore the environment-specific Terraform files, where we will invoke these modules to provision infrastructure tailored to our respective environments.

Dev/backend.tf
____________________________________________________________________
terraform {
// backend "azurerm" {
// resource_group_name = "tf_state"
// storage_account_name = "tfstate019"
// container_name = "tfstate"
// key = "terraform.tfstate"
// }
}
____________________________________________________________________
Dev/main.tf
____________________________________________________________________
module "azure_container_registry" {
source = "../Infrastructure/container-registry"
resource_group_name = var.resource_group_name
location = var.location
name = var.registry_name
}


module "azurerm_kubernetes_cluster" {
resource_group_name = var.resource_group_name
location = var.location
cluster_name = var.cluster_name
kubernetes_version = var.kubernetes_version
system_node_count = var.system_node_count
node_resource_group = var.node_resource_group
source = "../Infrastructure/kubernetes-cluster"
}
____________________________________________________________________
Dev/providers.tf
____________________________________________________________________
provider "azurerm" {
features {}
}

terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "2.78.0"
}
}
backend "azurerm" {
}
}
____________________________________________________________________
Dev/vairables.tf
____________________________________________________________________
variable "resource_group_name" {
type = string
description = "RG name in Azure"
}

variable "location" {
type = string
description = "Resources location in Azure"
}

variable "cluster_name" {
type = string
description = "AKS name in Azure"
}

variable "kubernetes_version" {
type = string
description = "Kubernetes version"
}
variable "registry_name" {
type = string
description = "Name for our azure container registry"
}

variable "system_node_count" {
type = number
description = "Number of AKS worker nodes"
}

variable "node_resource_group" {
type = string
description = "RG name for cluster resources in Azure"
}

One notable aspect evident in the aforementioned Terraform files is the extensive utilization of variables to define configurations, such as the resource group names or the container registry names. In Terraform, there are two primary methods for providing or setting the values of these variables. The first approach involves storing all the values in a “.tfvars” file, which is then supplied during the execution of Terraform scripts. Alternatively, the values can be directly specified when invoking “terraform apply.”

In our specific case, we will generate a “.tfvars” file that will be accessed by our Azure pipelines when provisioning the infrastructure. This allows for seamless integration and automation of the provisioning process, as the necessary variable values can be conveniently supplied from the designated “.tfvars” file.

Dev/terraform.tfvars
____________________________________________________________________
resource_group_name = "blogpost-terraform"
location = "West Europe"
cluster_name = "terraform-aks"
kubernetes_version = "1.24.3"
system_node_count = 3
node_resource_group = "blogpost-tfstate-resources-rg"
terraform_backend_storage= "blogposttfhzf8zk75"
terraform_backend_container = "core-tfstate"
registry_name = "blogpost-tfstate-registry"

With the completion of our Terraform scripts for infrastructure provisioning, the subsequent step involves building an Azure pipeline. This pipeline will be responsible for retrieving and executing these scripts to effectively provision the desired infrastructure.

2. Azure Pipelines to provision infrastructure

In this section, our focus shifts towards creating an Azure pipeline that will actively retrieve the infrastructure scripts from the previous section. This pipeline will play a pivotal role in executing these scripts, resulting in the actual provisioning of the desired resources.

.pipelines/azure-pipelines.yml
____________________________________________________________________
trigger:
branches:
include:
- '*'
variables:
- group: BlogPost-Infra-vg
- name: terraformWorkingDirectory
value: '$(System.DefaultWorkingDirectory)/Dev'
- name: backendAzureRmResourceGroupName
value: 'blogpost-tfstate-rg'
- name: backendAzureRmStorageAccountName
value: 'blogposttfhzf8zk75'
- name: backendAzureRmContainerName
value: 'core-tfstate'
- name: backendAzureRmKey
value: 'terraform.tfstate'
pool:
vmImage: ubuntu-latest
stages:
- stage: TerraformContinuousIntegration
displayName: Terraform Module - CI
jobs:
- job: TerraformContinuousIntegrationJob
displayName: TerraformContinuousIntegration - CI Job
pool:
vmImage: ubuntu-20.04
steps:
# Step 2: Install Az cli and login via the service principal
- task: AzureCLI@2
inputs:
scriptType: bash
azureSubscription: '$(serviceConnection)'
scriptLocation: inlineScript
workingDirectory: $(terraformWorkingDirectory)
inlineScript: |
sudo curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
az --version
az login --service-principal -u $(AppId) -p $(password)
--tenant $(Tenant_id)\
# Step 3: Install Terraform on Azure Pipelines agent
- task: TerraformInstaller@0
displayName: 'install'
inputs:
terraformVersion: $(terraformVersion)
# Step 4: run Terraform init to initialize the workspace
- task: TerraformTaskV2@2
displayName: 'Run Terraform init'
inputs:
command: init
workingDirectory: $(terraformWorkingDirectory)
backendServiceArm: $(serviceConnection)
backendAzureRmResourceGroupName:
'${{variables.backendAzureRmResourceGroupName}}'
backendAzureRmStorageAccountName: '${{
variables.backendAzureRmStorageAccountName}}'
backendAzureRmContainerName: '${{
variables.backendAzureRmContainerName }}'
backendAzureRmKey: 'terraform.tfstate'
# Step 5: Run Terraform validate to validate the HCL syntax
- task: TerraformTaskV2@2
displayName: 'Run Terraform validate'
inputs:
command: validate
workingDirectory: $(terraformWorkingDirectory)

# Step 6: run Terraform plan to validate HCL syntax
- task: TerraformTaskV2@2
displayName: 'Run Terraform plan'
inputs:
command: plan
workingDirectory: $(terraformWorkingDirectory)
environmentServiceNameAzureRM: '$(serviceConnection)'
commandOptions: -var location=$(azureLocation)
# Step 7: run Terraform apply to validate HCL syntax
- task: TerraformTaskV2@2
displayName: 'Run Terraform apply'
inputs:
command: apply
workingDirectory: $(terraformWorkingDirectory)
environmentServiceNameAzureRM: '$(serviceConnection)'
commandOptions: -var location=$(azureLocation)

The script mentioned above serves the purpose of installing Terraform, retrieving Terraform files from a designated variable named “terraformWorkingDirectory,” and subsequently provisioning our infrastructure while securely storing the state files in a blob storage. However, upon closer examination, it becomes apparent that sensitive information, such as our service connection and Terraform’s working directory, are supplied as environment variables within a variable group. In our case, this variable group is named “BlogPost-infra-vg”.

Azure offers a reliable option to create a secure variable group where all the required environment variables can be safely stored. This approach ensures the protection of sensitive data. Additionally, Azure provides granular control over the visibility of these variables, enabling you to determine which profiles, users, or identities can access and view the values associated with these variables. This additional level of security further safeguards the confidentiality and integrity of the environment variables within the Azure ecosystem.

Once this setup is completed the pipeline will be triggered as soon as the developer will push this terraform code on azure repository.

--

--

Rachit Ahuja

Machine learning and Data Engineer at Data Reply GmbH