Microsoft Azure and Office 365 resourcing issues
As must be clear to everyone by now, there has been a massive spike in demand for public cloud services since the coronavirus outbreak first hit us. Microsoft report that whole countries have gone from zero use of cloud to deliver teaching to 100% coverage of cloud-based remote learning in a matter of weeks. MIcrosoft Teams has probably borne the brunt of that demand.
It is therefore not surprising that we are beginning to see the first signs of resourcing problems. Yesterday, the Register reported that ‘Azure appears to be full’: UK punters complain of capacity issues on Microsoft’s cloud and I’ve seen similar reports elsewhere.
If you get errors when trying to provision VMs on Azure, the advice from Microsoft appears to be:
- Wait and try again from time to time to see if the resources get released
- Attempt to recreate the VM in the same region but with a different sizing
- Attempt to recreate the VM in another region.
Obviously the last of these needs to be treated with some caution. Although all Azure regions are built to the same level of compliance there are obviously factors to consider with relocating data: for example, although the EU Model Clauses and Privacy Shield are recognised as GDPR-compliant, they are likely to require ongoing monitoring by the data controller.
“We are a reasonable and pragmatic regulator, one that does not operate in isolation from matters of serious public concern. Regarding compliance with information rights work when assessing a complaint brought to us during this period, we will take into account the compelling public interest in the current health emergency.”
Remember that the cloud providers are prioritising government and emergency services use of the cloud.
Freeing up capacity for emergency health service use by temporarily moving less critical stuff to less-stressed regions seems to chime very well with that “compelling public interest”. Though, in the current emergency, are there any regions that are “less-stressed”?
Note that I do not believe that Microsoft will move data out of the region into which customers have put it — however, what customers choose to do is their decision and we may well be tempted to created resources outside of our normally prefered regions in response to the current crisis.
To the above three pieces of advice, I would add two more:
- Turn off any VMs that are not required for production workloads. That will save capacity for others.
- For production workloads, i.e. for things that you cannot afford to lose, turn off any tooling you have that auto-powers VMs off overnight — otherwise you may find that those VMs cannot be powered back up the next morning.
Sorry, I appreciate that these are both very obvious. But the main point is that we all need to provision resources carefully and being mindful of the wider impact on others. Otherwise we just become part of the problem.
Clearly, capacity issues in Azure will affect, and be affected by, capacity issues in Office 365. Microsoft have scaled their Office 365 capacity vastly, particularly in response to increased demand for Teams. Therefore, many of the same considerations will apply. That said, I’m unclear around best practice here, specifically in terms of how to provision Office 365 resources in order to minimise the impact on underlying services. As Office 365 is delivered as SaaS we, as customers, have relatively little control over the underlying resource allocation. But Microsoft are making adjustments to various features to maximise utilisation.
In all cases I would advise strongly against panic buying. We all know where everybody buying too much toilet roll gets us.
(Thanks to my colleague Andrew Cormack for advising on the GDPR-related aspects of this post).
Originally published at https://cloud.jiscinvolve.org on March 25, 2020.