Deploying Distributed Stateful Applications with Azure VM Scale Sets, Packer, Ansible and Terraform

In this post I am going to cover how distributed stateful applications can be deployed in a Azure VM Scale Set with a custom image built via Packer¹ and Ansible² using Terraform³. I will not cover the basics here, such as installation steps and will dive straight into the implementation. Some of the sections will only describe a brief concept and a methodology. This is aimed at intermediate to advanced level engineers but thats not to say beginners cannot have a read. Before we begin lets break down all the tooling and moving parts.
Azure Virtual Machine Scale Set
Azure virtual machine scale sets let you create and manage a group of identical, load balanced VMs. The number of VM instances can automatically increase or decrease in response to demand or a defined schedule. Scale sets provide high availability to your applications, and allow you to centrally manage, configure, and update a large number of VMs. With virtual machine scale sets, you can build large-scale services for areas such as compute, big data, and container workloads.⁴
Custom Image
Packer supports creation of custom images using the azure-arm builder and Ansible provisioner. We will be using both to create a Linux based Azure Managed VM Image⁵ that we will deploy using Terraform. Azure Managed VM Image abstracts away the complexity of managing custom images through Azure Storage Accounts and behave more like AMIs in AWS.
Distributed Stateful Application
A distributed stateful application stores our critical data that we cannot afford to lose across an X number of nodes for example a data store like Apache Cassandra⁶ or Apache Zookeeper⁷. As with such data stores we have three requirements:
- High Availability: If we lose up to X nodes our clients dependent on this data store can still function as normal
- Redundancy: Any nodes we lose are replaced with new ones without any manual intervention and join the cluster without compromising the integrity of the data
- Scalability: Horizontally scaling up is as easy as changing a variable in a Terraform var file
Lets dive straight in! The versions I am using are:
- Packer: 1.2.5
- Ansible: 2.5.3
- Terraform: 0.11.7
- terraform-azurerm-provider: 1.13.0
Before we begin Terraforming, we need to build our custom image using Packer.
Building Packer image
There are a few considerations we need to take into account when we build our image. Since we are deploying a mission critical distributed data store we need to do a bit of extra leg work broken down as follows:
- Node Discovery: When a node comes up, it needs to join the cluster by itself. The node needs to know the IP addresses of all the other nodes in the VM Scale Set.
- Resilience: Nodes in our cluster will go down. Our configuration files will need to be updated with the latest state of our VM Scale Set for nodes already existing in the cluster.
- Rolling upgrades: New versions of our custom image will need to be rolled out in a way that it causes minimum downtime. There are multiple approaches to this but I will cover Rolling upgrades only. Rolling upgrades allow us to go to each node one by one, take it down and bring it back up with the new custom image.
- Redundancy: We cannot afford to lose our data even if a node goes down. The best place to write our data will be in a separate data disk backed by Azure Managed Disk⁸. This way we can take regular snapshots of our disk for backups and attach it into another VM if our original VM goes down.
- Performance: Since our data partition is backed by Azure Managed Disk, choosing the Premium SKU can give us a performance boost.
Our directory structure can be as follows:
neerantest/
- packer_template.json
- data_store.yml
- discover_nodes.sh
- inventoryPacker template (packer_template.json)
{
"variables": {
"client_id": "<azure_client_id>",
"client_secret": "<azure_client_secret>",
"subscription_id": "<azure_subscription_id>",
"location": "westus",
},
"builders": [{
"type": "azure-arm",
"client_id": "{{user `client_id`}}",
"client_secret": "{{user `client_secret`}}",
"subscription_id": "{{user `subscription_id`}}", "os_type": "Linux",
"image_publisher": "Canonical",
"image_offer": "UbuntuServer",
"image_sku": "16.04-LTS", "managed_image_name": "neeran-test",
"managed_image_resource_group_name": "neeran-packer-images", "location": "{{user `location`}}",
"vm_size": "Standard_A1_v2",
"disk_additional_size": [10]
}],
"provisioners": [{
"execute_command": "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'",
"inline": [
"apt-get update",
"apt-get install ansible -y"
],
"inline_shebang": "/bin/sh -x",
"type": "shell"
},
{
"type": "file",
"source": "discover_nodes.sh",
"destination": "/tmp/discover_nodes.sh"
},
{
"type": "ansible-local",
"playbook_file": "data_store.yml",
"inventory_groups": "inventory",
"command": "ANSIBLE_FORCE_COLOR=1 PYTHONUNBUFFERED=1 ansible-playbook"
},
{
"execute_command": "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'",
"inline": [
"/usr/sbin/waagent -force -deprovision+user && export HISTSIZE=0 && sync"
],
"inline_shebang": "/bin/sh -x",
"type": "shell"
}
]
}
The above template is doing the following:
- Bring up a node in the westus region based on the market place Ubuntu 16.04-LTS image
- Install Ansible
- Move discover_nodes.sh into /tmp directory on the VM
- Run our Ansible playbook
- Generalise our image so it can be deployed to any other VM
Ansible Playbook (data_store.yml)
---
- name: Setup data store
hosts: localhost
become: true tasks:
- name: add the apt repos
apt_repository:
... - name: add apt key
apt_key:
... - name: install data store and azure-cli dependencies
apt:
name: "{{ item }}"
state: present
update_cache: yes
with_items:
- <our datastore>
- azure-cli
- jq - name: move discovery script to somewhere in our path
copy:
src: /tmp/discover_nodes.sh
dest: /usr/local/bin/discover_nodes.sh
mode: u+x
The above Ansible playbook is doing the following:
- Installs our data store of choice and azure-cli. I have skipped the setting up the apt repo steps as it depends on the data store.
- Copy discover_nodes.sh script into /usr/local/bin.
Why do we need azure-cli? Azure CLI is a great tool for fetching information about our VM Scale Set like IP addresses. It is possible to communicate to the Azure REST API directly for more complex cases.
Note: Ansible is not necessary here, the steps it takes can easily be done via a Bash script. Its up to personal preference, I’d say it depends on how complicated the building of the image is.
Inventory (inventory)
[localhost]
127.0.0.1Discovery Script (discover_nodes.sh)
#!/bin/bashaz login --identityRESOURCE_GROUP=$(curl -H 'Metadata: true' "http://169.254.169.254/metadata/instance/compute/resourceGroupName?api-version=2017-08-01&format=text")VMSS=$(curl -H 'Metadata: true' "http://169.254.169.254/metadata/instance/compute/vmScaleSetName?api-version=2017-12-01&format=text")VMSS_INSTANCE_IPS=$(az vmss nic list -g $RESOURCE_GROUP --vmss-name $VMSS --query [*].ipConfigurations[].privateIpAddress)...# Use the IP addresses to generate/update the configuration for your data store# Start/Restart data store
The above Bash script makes use of the Azure Metadata Service⁹ to fetch the resource group name and VM Scale Set name. Azure-cli supports authentication via Azure Managed Service Identity¹⁰ which allows us to talk to the Azure REST API and fetch the IP addresses of our VM Scale Set VMs. At this stage lots of robust logic can go here, for example we can check for the status of the VM within our VM Scale Set or hit a health check endpoint and populate our configuration files with those healthy IP addresses.
There is another consideration that needs to be taken into account. When do we run this?
- Cloud-init: run script the first time a VM boots up. Here we are hoping nothing ever goes wrong with our cluster or we manually intervene. This is the most safe approach where we take everything into our hands once we see an issue.
- Cron: run every X mins and populate our configuration files with always the healthy VM IP addresses. This depends on the nature of the data store, ideally we do not want to cause downtime by restarting all the nodes as soon as any one node goes down. If the data store supports hot reload, this is a perfect choice.
- Cloud-init: run the script first time a VM boots up WITHOUT health checking. We want to always keep X number of healthy nodes, unhealthy nodes, we will leave the data store to handle them hence our configuration will have all VM IP addresses in the VM Scale Set. This again depends on the data store as some are more fussy than others or does not fit this scenario.
- Do not run at all: automation around databases simply cannot be trusted and everything is setup manually. This approach is not outdated or should be looked down upon, anything is better than losing data, sure this approach might involve ongoing effort in maintenance but if it keeps the business running then thats key. With adequate testing and given enough time, trust can be built in automation and can save time in the future (even if you may not be managing the stack).
The answer to the above question depends on your data store, so its up to you. There are more ways of running this script in different scenarios and the approaches we can take maybe these can be covered in a different post.
Now that we have our Packer template and Ansible Playbook in place, lets run our Packer build.
cd neerantest/ && packer build --template packer_template.jsonThe above will spin up a temporary VM in Azure, run our provisioner steps, stop the VM, capture the image and destroy the VM. On the Azure Portal inside of the neeran-packer-images resource group our custom image neeran-test will reside.
Deploying VM Scale Set with Custom Image using Terraform
Our custom image is ready to deploy. The directory structure is as follows:
neerantest-terraform/
- main.tf
- cloud-init.cfg
- assign-role.tfmain.tf
# Azure Managed Image
data "azurerm_image" "neeran_test" {
name = "neeran-test"
resource_group_name = "neeran-packer-images"
}# Render the cloud config
data "template_cloudinit_config" "cloud_init_config" {
gzip = true
base64_encode = true
part {
filename = "init.cfg"
content = "${file("cloud-init.cfg")}"
}
}# VM Scale set
resource "azurerm_virtual_machine_scale_set" "data_store" {
name = "data-store-test"
location = "westus"
resource_group_name = "neeran-data-store-test"
upgrade_policy_mode = "Manual" os_profile {
...
custom_data = "${data.template_cloudinit_config.cloud_init_config.rendered}"
} storage_profile_os_disk {
caching = "ReadWrite"
create_option = "FromImage"
managed_disk_type = "Standard_LRS"
} storage_profile_data_disk {
lun = 0
caching = "ReadWrite"
create_option = "FromImage"
disk_size_gb = "10"
managed_disk_type = "Premium_LRS"
} storage_profile_image_reference {
id = "${data.azurerm_image.neeran_test.id}"
} identity {
type = "SystemAssigned"
} extension {
name = "MSILinuxExtension"
publisher = "Microsoft.ManagedIdentity"
type = "ManagedIdentityExtensionForLinux"
type_handler_version = "1.0"
settings = "{\"port\": 50342}"
} ...}
The above Terraform file highlights the main parts we are interested in. They are broken down as follows:
- Cloud-init custom data: the contents of this file are covered below.
- Upgrade Policy: at present the terraform-azurerm-provider¹⁰ does not support “Rolling” upgrade policy. This step will be carried out manually via the azure-cli.
- OS Disk: the OS disk needs to have the create_option as FromImage for our custom image
- Data Disk: same for data disk as the size is predetermined when we built the Packer image. We will be using the Premium SKU to maximise performance.
- Image Profile: storage_profile_image_reference needs to be set to the id of our custom image which we retrieve as a data source.
- Managed Identify: the MSILinuxExtension needs to be enabled to allow us to use Azure Managed Service Identity. This creates a Service Principal¹¹ in the background and assigns it to the VMs in our VM Scale Set. The name of the Service Principal will be the name of the VM Scale Set, data-store-test.
cloud-init.tf
# Cloud Init Config
repo_update: true
apt_upgrade: trueruncmd:
...
- <mount data disk>
- bash /usr/local/bin/discover_nodes.sh
The above Cloud Init config mounts the data disk for our data partition and runs the discover_nodes.sh script that already lives on our custom image. Dependencies for our data store come pre-baked on the image, which should reduce the time it takes for a new node to come up.
At this stage our discover_nodes.sh script will fail this is because we did not assign any scopes for the Managed Identity Service Principal so our “az login — identity” will fail. The solution is to assign a role to the service principal ideally during the Terraform run.
assign-role.tf
# No native support for RBAC assignment in Terraform
resource "azurerm_template_deployment" "assign_role" {
name = "test-assign-role"
resource_group_name = "neeran-data-store-test"
deployment_mode = "Incremental"
template_body = "${file("rbac_arm_template.json")}"parameters {
principalId = "${lookup(azurerm_virtual_machine_scale_set.data_store.identity[0], "principal_id")}"
builtInRoleType = "Reader"
roleNameGuid = "00000000-0000-0000-0000-000000000000"
scopeId = "${azurerm_resource_group.neeran_data_store_test.id}"
}
}
In this case we fetch the principal_id of the Managed Identity Service Principal and assign the built-in Reader role on the whole resource group. This will ensure we can access the VM Scale Set via azure-cli. An example ARM template can be found here:
Note: In this example we have assigned Reader on the whole resource group, ideally we simply need to access to the VM Scale Set only.
All the pieces are in place, we can go ahead and apply Terraform.
cd neerantest-terraform && terraform init && terraform plan && terraform applyDuring the terraform apply phase there will be a race condition. The VM Scale Set is created first which means the cloud init script will trigger and the discover_nodes.sh script will fail because we are in the middle of assigning the role. A simple solution is to just add a sleep in our cloud init script.
Adding new nodes
Simply increase the capacity of the VM Scale Set SKU to the desired number. The idea here is to set max higher than the desired capacity, so if a node does go down it is available for debugging and for extra head room.
Replacing nodes
Destroy the node from the Azure Portal or azure-cli and the VM Scale Set should do its magic. There maybe manual steps involved depending on the data store in use.
Deploying a new custom image
Terraform azurerm provider does not support the “Rolling” upgrade policy but there is an open PR:
https://github.com/terraform-providers/terraform-provider-azurerm/pull/922
New images can be deployed using the “Rolling” upgrade policy via the azure-cli. The steps are as follows:
- Change the image id of the Managed Image.
- Terraform will change the image in place but not do anything since our upgrade policy is Manual
- az vmss rolling-upgrade start -n <vmss_name> -g <resource_group>
This approach needs to be applied with caution as not all distributed data stores will happily comply and data maybe lost.
Update:
az vmss rolling-upgrade only works on platform images. In this case we can issue an “update” instead and go through each of our VMSS instance ids. For example:
az vmss update-instances -n <vmss_name> -g <resource_group> — instance-ids 0
Conclusion
Azure VM Scale Sets have come a long way and can be used with Packer, Ansible and Terraform to build robust infrastructure that is self-healing, easy to manage and customisable. Whilst not fully at the level of AWS Autoscaling groups, deploying distributed applications in Azure using open source tools got a whole lot easier.
Notes
- https://www.packer.io/
- https://www.ansible.com/
- https://www.terraform.io/
- https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview
- https://blogs.technet.microsoft.com/keithmayer/2017/08/17/how-to-azure-managed-vm-images-using-premium-data-disks/
- http://cassandra.apache.org/
- https://zookeeper.apache.org/
- https://azure.microsoft.com/en-gb/services/managed-disks/
- https://docs.microsoft.com/en-us/azure/virtual-machines/windows/instance-metadata-service
- https://github.com/terraform-providers/terraform-provider-azurerm
- https://docs.microsoft.com/en-us/azure/active-directory/managed-service-identity/
- https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals
- Title image from: https://www.freeimages.com/photo/stairs-1637560
