Azure Cost Management —Cost Analysis

This is the second post on Azure Cost Management, the first post is Azure Cost Management — Report Creation.

Mark Hartshorn
Capgemini Microsoft Blog
9 min readSep 14, 2022

--

Photo by micheile dot com

Before I start on my approach, I should point out that Azure does an excellent job of providing recommendations on cost saving. This can be found under Cost Management / Advisor Recommendations. You may have seen this appearing as a pop-up window when you log into the Azure Portal.

Some of the analysis will require you to have experience in Azure and Azure products. It is also helpful if the development/support team are available to discuss why parts of a system were developed in a particular fashion. Some of the savings were achieved over a period of time as the development team had to make relevant changes and then deploy them through the various environments.

So, we now have several reports, the question is what can they tell us? Using both the Environment and Service reports I was able to look at the Azure Resource spend from two different sides. Using Power BI’s ability to sort reports just by clicking on the columns, I started with the Service report, sorted the monthly totals column descending and looked at any entry above £1000 per month.

Service Costs over £1000 per month
Service Costs over £1000 per month

I should highlight that I started looking at services first as this allowed me to investigate possible savings that will be reflected across multiple environments.

API Management

On investigating the individual API Management service, I found the following:

API Management Costs by Environment
API Management Costs by Environment

Now this is where your Azure knowledge will have to come in to play. I know from previous projects that APIM has various Tiers, -(see Microsoft Docs: Feature-based comparison of the Azure API Management tiers) which have different monthly prices. In the above list environments dev, dmo, ext, sit, tst are using the Developer tier whereas uat1, prd and preprod are using the Premium tier. The Premium tier comes with a minimum cost of £2000 per month and is fully backed up with SLAs from Microsoft. The Developer tier has no SLA, and documentation indicates that Microsoft can update the service at any time.

Now, the production environment (prd) is live and requires the Premium tier for performance and SLAs etc, but do uat1 and preprod require the Premium tier? In this instance the answer isn’t as simple as yes or no. If uat1 is being used for user acceptance testing, then it should be Premium tier; but outside of these periods, could it be downgraded to Developer tier? The same logic also applies to Pre-Production (preprod). In the projects’ current development life cycle, it was acceptable to downgrade both uat1 and preprod to Developer tier and save £4000 per month. These would then be upgraded back to Premium tier when required.

Virtual Machines — Sizing

Virtual Machine Costs by Environment
Virtual Machine Costs by Environment

It is fair to say that the project has a significant number of virtual machines (VMs) all being used for different reasons. There were also a number of developers still working on the project using different VMs as development machines.

My approach to VMs was

1) All VMs: are they overpowered? An analysis was conducted to see if any VMs could be downsized. An example of this was the Azure DevOps self-hosted build machine. When the development team was at its largest, the VM was sized at 8 vCPU and 32GB memory. This has now been reduced to 4 vCPU and 16GB memory. When the system goes fully into support it could be reduced to a smaller size or even powered down (deallocated) if no deployments are expected.

2) Development VMs: Do they have automated start and stop times? If they are part of DevTest Labs, then they can be setup to run for 12 hours Monday through Friday, which is normally more than enough for developers even with different working patterns. This means you only pay for 2 ½ days per week and then save 4 ½ days costs. If a developer requires their VM for additional work over the weekend or later into an evening, then it is simple to change the times to allow it to be available. Note if you have developers across multiple geographic locations then you should create a separate DevTest Lab for each time zone.

3) All VM’s: Are they still required? Some VMs were powered down/deallocated but had not been deleted. Admittedly the bulk of the cost for a VM comes from Compute, but Disks and IP address costs can mount up over time.

4) Production VMs: assuming they have been sized correctly can they be changed to reserved instances?

Using the above approach over 12 months, I reduced the virtual machines costs from £5756 per month down to £2422.

Functions

Function App Costs by Environment
Function App Costs by Environment

On investigating Function App costs, I found that each environment had two ‘Application Service plans’ that different functions were assigned to. Originally some of the functions was going to be for internal use only, but this changed as the system was developed. I put forward a change that all functions should be deployed into a single ‘Application Service plan’ and the second plan could be removed. This would save £240 per month per environment. As we had eight environments (excluding SFTP) we would save £1920 per month.

SQL Databases

Database Costs by Environment
Database Costs by Environment

With the managed SQL Databases, I started looking at the performance details over the past few months. It was obvious that the production database was under-utilised except for a period overnight Monday to Friday. After discussing this with the development team, I was informed that a Function app was running at that time and extracting a very large file from the system. The database was set to its current level to ensure the file was exported within the default Function app timeout period (one hour). The timeout period was extended, and the database performance was reduced. If the Function app had issues and timed out, then it could be manually rerun the following day. This change was applied across all environments. This reduced the SQL expenditure from £4519 per month to a cost of £756 per month.

Storage

No changes were made from a direct storage point of view.

Azure DDOS Protection

Azure DDOS Protection standard had been originally specified for the programme. A review was conducted with the senior architects and the security architect, and it was decided that the default DDOS Protection service was sufficient. Therefore, the upgraded service was removed. This saved approximately £2000 per month.

Virtual Machine Licenses

These licenses are part of a third-party service (Firewall etc) and could not be reduced.

Licenses — Microsoft 365 Services

License Costs
License Costs

The report for licenses is useful for the Licenses timeline report and Difference Last Month report to identify which licenses have increased / reduced, but to identify potential cost reductions more granular detail is needed. So, the question is: how to try and reduce licensing costs?

As licenses are allocated to people, I looked at what Azure Active Directory could provide. I downloaded a list of users with the following information

· User Name
· Email
· Account Created Date
· Last Login Date
· Number of Login’s in the last 30 days
· Allocated Licenses

I sorted the report on ‘Number of Login’s in the last 30 days’ ascending and then ‘Last Login Date’.

Looking through the report I realised that I had several distinct groups of people

1) People who had never logged in.
2) People who had logged in infrequently (less than 5 times in the last 30 days)
3) People who are logging in on a regular basis

Group One: People who had never logged in

I decided that for people who had never logged in (and whose accounts were older than ninety days), we would disable their accounts and remove all licenses. If their account was required in future, we could re-enable it and re-allocate licenses. For people whose accounts had been created within the last ninety days, we enquired whether they would be using the allocated licenses. It was amazing to find out the number of teams who would take on a new person and instantly give them all licenses, even if they didn’t know what part of the system they were going to use.

Group Two: People who had logged in infrequently

Again, I enquired what licenses people would require. This also prevented us removing people’s licenses if they had been on holiday recently. When challenged most people accepted that we could save a significant amount of money per month (potentially up to £350 per person per month) and would happily give up their licenses knowing we could reallocate them very quickly — the project team looking after user/license administration for us have an SLA of 99% of requests completed with an hour. If you are in a situation where you need to raise a request and have (for example) a 5-day SLA with another team then this situation may not work as well and people may be reluctant to give up their licenses.

Group Three

I conducted an audit on each person and (if any) licenses looked odd, i.e. I wasn’t sure why they had them; then I contacted them to discuss. This did remove a few licenses but was limited in comparison to groups one and two.

This approach has now been adopted and licenses are reviewed on a bi-monthly basis. The first review saw a reduction of just over £2000 per month in licenses being removed.

Conclusion

Although Cloud computing is marketed as cheaper, it is very easy to quickly spend significant amounts of money. Although Azure provides recommendations on how to save money I found there is very little tooling available which allows you to drill down into costs and make comparisons over time. The report I developed provided me with the ability to see costs from an environment and an Azure service point of view. This is what was needed to get to the crux of consumption and resource spend and allowed me to focus on the relevant areas of the system.

The report has proved useful in allowing me to see differences from the previous month and hopefully this information will help the ongoing programme team in keeping their Azure costs down. Any significant increases can be investigated if they were not expected. You also have to be aware that some costs will be required, for example Security, third party license costs; and you may have limited if any chance of reducing them.

Certain aspects of cost saving can go against architectural design principles - for example two App Service Plans for the Azure Functions might make logical sense from an architectural point of view but comes with increased cost. Reviewing the design principle and redeploying the amended configuration saved a significant amount of money over the expected life of the system.

Although I have designed a way of replacing/allocating tags to amend tag details, it doesn’t remove the necessity of having a well-designed tagging policy.

On this project I was lucky that the development team was still present and some of the changes could be completed and deployed through to production quickly. Changes to a live system will take time before you see a reduction in costs. i.e. a code change can take months before it is published to production and you will only see the full cost benefit the following month.

Overall, I managed to reduce the monthly Azure spend from £39,700 per month to £18,750 per month in fourteen months, something I am extremely proud of. An additional benefit is that a decrease in cost will also reduce your C02 emissions, so is beneficial from a sustainability point of view as well.

Costs Summary Report
Costs Summary Report

This is the second post on Azure Cost Management, the first post is Azure Cost Management — Report Creation.

If you want to learn more about the Microsoft Team here at Capgemini, take a look at our open roles and consider joining the team!

--

--