How To Fix Broken Extensions on an Azure Virtual Machine Scaleset
You may have found yourself troubleshooting a Virtual Machine Scaleset in a “Failed” state. This can happen for a number of reasons. In this article, we explore how to troubleshoot Azure Virtual Machine Scalesets after a VM Extension failure.
tl;dr — Just show me a video!
The Scenario
We will create an example broken scenario. We will deploy a Virtual Machine Scaleset with two VM instances as part of the scaleset. We will attempt to deploy Azure Insights to the VM Scaleset. But first we will rename the powershell.exe executable on the first instance of the Scaleset which Azure Insights uses to deploy the VM Extensions causing the deployment to fail.
The Setup
We have a Resource group named “VMSS-RG” which houses the resources below:
The main resources here are the Virtual Machine Scaleset named “vmss”, a Public Load Balancer which we used to connect through RDP over the internet, and the Log Analytics workspace we wish to use to store data when we enable Azure Insights, a pretty kick-ass monitoring tool.
Let’s Take a look at the VM Scaleset and its VM instances.
These are two instances running Server 2016. And they also reflect the “Latest model” of the Scaleset meaning they are up-to-date with the Azure VM Scaleset configuration changes.
Let’s Break It!
We are going to break the first VM instance and leave the second one intact so that we may compare the two. Logging onto the first VM we will navigate to:
C:\Windows\System32\WindowsPowerShell\v1.0
Here we renamed powershell.exe
by prepending it with an underscore (‘_’). When we deploy the Insights monitoring tool, one of the Azure VM Extensions depend on PowerShell to install which will fail.
We deploy Insights by selecting the “Insights” tab and Enabling the feature and specifying our Log Analytics workspace.
We can see the two new Extensions that have been deployed to the VMSS configuration model.
Now that the VMSS model has changed the VMSS Instances are not running the latest model. Once we upgrade the VM instances though they begin to attempt to deploy the VM Extensions. Choosing the first VM Instance and clicking on the Status we see the failure:
You can see that it failed to install the extension despite the instance showing in green.
Also the VMSS Overview screen shows a very disturbing red banner across the top showing their is something wrong with the deployment as well.
Inspection and Troubleshooting
Let’s inspect the issue and take a peak under the hood. The YouTube video goes in-depth walking through each of the PowerShell commands, however for this article we’ll just post the code for reference.
Solution
After we’ve identified the issue and fixed it (i.e. renaming _powershell.exe
back to powershell.exe
) we need to perform clean up of both our VM Scaleset Object and the VMss Instance. We need to perform the following actions:
- We need to remove the
DependencyAgentWindows
extension from the VM Scaleset usingRemove-AzvmssExtension
andUpdate-AzVmss
cmdlets - We need to Upgrade the VMss instance to the latest model that has the removed extension
- We need to clean up the VMss instance by removing the failed extension via PowerShell List removal using
$instance1.Resources.RemoveAt(1)
andUpdate-AzVmssVM
Each of these steps are shown in the above PowerShell code. For Step 2 you may perform that in the Portal by simply selecting an instance and clicking “Upgrade.”
Conclusion
After the above steps have been completed we have effectively resolved the issue and the red flags have all disappeared. We’ve reset the state of our VMSS object model and the instances running against it. From this point on since we’ve fixed the powershell.exe naming, we may safely re-deploy the extension to the VM scaleset instances.
This article has shown you how to in general troubleshoot failed VM Scaleset Extensions and perform the cleanup to remove red flags and Failure messages. Thanks for tuning in!
Follow me:
Twitter: https://twitter.com/azurejackson