How To Fix Broken Extensions on an Azure Virtual Machine Scaleset

AzureJackson
4 min readJan 27, 2020

You may have found yourself troubleshooting a Virtual Machine Scaleset in a “Failed” state. This can happen for a number of reasons. In this article, we explore how to troubleshoot Azure Virtual Machine Scalesets after a VM Extension failure.

tl;dr — Just show me a video!

In-depth YouTube video Link

The Scenario

We will create an example broken scenario. We will deploy a Virtual Machine Scaleset with two VM instances as part of the scaleset. We will attempt to deploy Azure Insights to the VM Scaleset. But first we will rename the powershell.exe executable on the first instance of the Scaleset which Azure Insights uses to deploy the VM Extensions causing the deployment to fail.

The Setup

We have a Resource group named “VMSS-RG” which houses the resources below:

VMSS Resource Group

The main resources here are the Virtual Machine Scaleset named “vmss”, a Public Load Balancer which we used to connect through RDP over the internet, and the Log Analytics workspace we wish to use to store data when we enable Azure Insights, a pretty kick-ass monitoring tool.

Let’s Take a look at the VM Scaleset and its VM instances.

VM Scaleset Instances

These are two instances running Server 2016. And they also reflect the “Latest model” of the Scaleset meaning they are up-to-date with the Azure VM Scaleset configuration changes.

Let’s Break It!

We are going to break the first VM instance and leave the second one intact so that we may compare the two. Logging onto the first VM we will navigate to:

C:\Windows\System32\WindowsPowerShell\v1.0

Renaming to _powershell.exe

Here we renamed powershell.exe by prepending it with an underscore (‘_’). When we deploy the Insights monitoring tool, one of the Azure VM Extensions depend on PowerShell to install which will fail.

We deploy Insights by selecting the “Insights” tab and Enabling the feature and specifying our Log Analytics workspace.

Enable Insights monitoring

We can see the two new Extensions that have been deployed to the VMSS configuration model.

Two new VM Extensions

Now that the VMSS model has changed the VMSS Instances are not running the latest model. Once we upgrade the VM instances though they begin to attempt to deploy the VM Extensions. Choosing the first VM Instance and clicking on the Status we see the failure:

VM Scaleset instance DependencyAgentWindows Extension failure
Azure Portal notification of the failure

You can see that it failed to install the extension despite the instance showing in green.

Also the VMSS Overview screen shows a very disturbing red banner across the top showing their is something wrong with the deployment as well.

VMSS Failure Red Banner
Red Flag despite “All succeeded”

Inspection and Troubleshooting

Let’s inspect the issue and take a peak under the hood. The YouTube video goes in-depth walking through each of the PowerShell commands, however for this article we’ll just post the code for reference.

GitHub Gist PowerShell Script

Solution

After we’ve identified the issue and fixed it (i.e. renaming _powershell.exe back to powershell.exe ) we need to perform clean up of both our VM Scaleset Object and the VMss Instance. We need to perform the following actions:

  1. We need to remove the DependencyAgentWindows extension from the VM Scaleset using Remove-AzvmssExtension and Update-AzVmss cmdlets
  2. We need to Upgrade the VMss instance to the latest model that has the removed extension
  3. We need to clean up the VMss instance by removing the failed extension via PowerShell List removal using $instance1.Resources.RemoveAt(1) and Update-AzVmssVM

Each of these steps are shown in the above PowerShell code. For Step 2 you may perform that in the Portal by simply selecting an instance and clicking “Upgrade.”

Conclusion

After the above steps have been completed we have effectively resolved the issue and the red flags have all disappeared. We’ve reset the state of our VMSS object model and the instances running against it. From this point on since we’ve fixed the powershell.exe naming, we may safely re-deploy the extension to the VM scaleset instances.

This article has shown you how to in general troubleshoot failed VM Scaleset Extensions and perform the cleanup to remove red flags and Failure messages. Thanks for tuning in!

Follow me:

Twitter: https://twitter.com/azurejackson

--

--

AzureJackson

Azure Architect and Developer enthusiast / Azure MCSE: Cloud; MCSD: App Builder / Certified Ethical Hacker