Adventures in Infrastructure as Code: Lessons learnt using Azure ARM templates
I have recently been working on a project where I’ve concentrated on using ARM Templates to automate the creation and management of the Azure Infrastructure.
There are a few things that I wish I had known before I started so this is my attempt to document these best practices, and hopefully help people that find themselves in the same position
Lesson #1: You need an IaC Test Environment
It took me a while to have a completely separate environment, which allowed me to test the changes to the templates without any fear of affecting anybody or breaking the CI build.
The second part of this lesson is that you want to deploy the code and run the integration/acceptance tests against this IaC Test environment. After all, just because you’ve configured the environment to spec, it does not mean that it will work.
I configured an App Service to use assigned managed identities but this did not work. It turns out that the AzureServiceTokenProvider did not support these type of managed identities, and because I wasn’t doing a full end-to-end test, this broke the build.
If you get pushback due to cost, spend time on a PowerShell script that deletes the environment. This can be configured to run nightly so that you avoid racking up unnecessary costs. Remember that this environment can be a lower spec than other environments.
Lesson #2: Split your templates early
It’s likely that when you start to create your templates you will only have sight of a small part of the architecture as it’s probably not been fully defined yet., So you start by adding a single App Service on the API layer, and then you add an Azure Function and another — and before you know it your template has 60 parameters, 10 resources, and it has become an unwieldy mess. Worst of all, you have something similar across the other templates of your application.
The solution is to split your templates into a master template with linked templates for each resource. It’s worth bearing in mind that the linked templates can contain more than a single resource — for example all the databases in an elastic pool.
The downside of this approach is that the linked templates must be made available to the pipeline somehow. In practice this means uploading them to Blob storage, for which is there is an existing built-in task in Azure Pipelines.
The earlier you do this, the less painful it will become — but this approach requires a significant architecture to be worth it. In other words, if you have an instance of AppInsights, a couple of functions, a storage account and a database, you could easily keep this in a single template.
Lesson #3: Agree a naming convention and stick to it
If I’m not happy with the name of variable in a C# Visual Studio project I can easily rename it by pressing CTRL + R, CTRL + R and Visual Studio will rename it everywhere. Granted that if this is a public variable consumed elsewhere the process would not be so simple.
In order to change the name of parameter in an ARM template, I have to do:
- Do a Find and Replace.
- Change Name in All Variable Groups.
- Change Name in Build/Deployment Task(s) in Azure DevOps pipeline(s)
Step 2 is optional but highly recommended as otherwise the names won’t match and it can make things confusing.
Furthermore, if you are using Task Groups, see Lesson #4, then step 3 is trickier as there is no actual validation of the template parameters.
I won’t comment on our naming convention as it really doesn’t matter, the important action here is to agree one before you start, preferably one where nobody is terribly displeased with it and stick to it.
If you really must change it, then remember: test, test, test across all your environments.
Lesson #4: Use Task Groups
Imagine that you are lucky enough to have a full set of environments: CI, Test, UAT, Integration Testing, Pre-Prod and Prod. There is a problem: every time you need to make a change to a build task you have to make the same change across all or most environments (I say most as you would usually set the defaults so that they are valid for Dev or Test). This is where task groups come in. In short, these are groups of Build/Deployment tasks that can be reused. This enables you to make a change to the task group, which will then apply to all your environments.
If you are thinking this will not work for you because you have different tasks for different environments, e.g. you load different test data for Test and UAT, you can use conditions so that tasks inside the task group only run if a condition is met. The default condition is Only when all previous tasks have succeeded but this can be changed.
If you couple Task Groups with conditional tasks, then your pipelines will be significantly more robust and maintainable.
Lesson #5: Don’t be afraid to use PowerShell or Azure CLI
There are things that cannot be done with ARM templates, either because they will potentially never be supported or because they are not supported now. To try to bend ARM to our will is likely to result in frustration and a lot of wasted time, due to the relatively slow workflow.
As an example, the Service Principal Name (SPN) that is created when a Managed Identity is added to an App Service is not an output parameter of the templates, or at least it wasn’t
when I tried it. I spent quite a bit of time trying to get this (OK, maybe not that much time but it certainly felt that way).
In the end I wrote a couple of scripts that obtained the SPN and then set the database administrator up. Could this have been done with ARM templates? Maybe, maybe not; but the client doesn’t care about this, they care about a functioning product now (sometimes yesterday but that’s another story) and to achieve this, scripts were the way to go.
This is most emphatically not a steer to give up on ARM templates at the first sign of trouble, it’s a reminder that we are delivering value to a client or business rather than running for a prize of the purest implementation of X technology; and that, sometimes, means compromises must be made.