Since we, at Mimo, write our backend with .NET Core and host it on Azure App Service, we experience a lot of awesomeness that App Service provided, but also a lot of the pain points that come with hosting a zero-downtime, highly-available service on it.
URL Based Health Checks
When you build a highly-available service you want your hosting solution to constantly monitor the individual instances and make sure they are replaced when one of them goes down.
While you can set up health monitoring via Application Insights, this still won’t recover your instances in case of a failure. It would be great if I could configure a specific URL that is regularly pinged on every instance, where I can specify what happens if the response of that returns a specific status code. This pretty much how Google App Engine does it.
Better Zero-Downtime-Deployment Experience
The base mechanism for enabling zero-downtime-deployments on Azure App Service is with Blue-Green Deployment via Deployment Slots. This works by setting up a staging slot, deploying the new version of the service onto the staging slot, letting it warm up, so it’s ready, and then swapping it with the production slot.
There are now two possibilities: Deploying the new version to the staging slot and manually pressing the swap button or configuring “Auto Swap”, which automatically swaps the staging slot into production after a new version has been deployed and the service has warmed up.
Since we try to automate as much as possible, we’re using the auto swap solution, which makes it possible for us to deploy a new version of a service by simply merging the latest version into our production branch and letting our CI server do the rest. Unfortunately, App Services doesn’t provide a lot of insights into what the Auto Swap actually does behind the scenes, how it behaves and if it actually succeeded or failed.
A lot of surprises for us happened because there is no logging and no documentation how Auto Swap exactly behaves, if you take a look at the following issue on Github, you’ll see that a lot of people feel the same way: https://github.com/projectkudu/kudu/issues/2583
Native Let’s Encrypt
Let’s Encrypt provides free SSL certificates at no cost for everyone, which is a great thing as the web should have HTTPS everywhere. Unfortunately, Azure App Service doesn’t have Let’s Encrypt support out-of-the-box.
While it’s pretty easy to use something like https://github.com/ohadschn/letsencrypt-webapp-renewer, it still requires setting up a completely new web app just for renewing the certificates (Let’s Encrypt certificates expire after 3 months) and doesn’t work when you have Local Cache enabled, which is a must for highly-available services.
The environment variables (aka “Application Settings”) in App Service is pretty horrible, especially in junction with deployment slots.
There are two main issues I have with the current implementation of the App Settings:
- They are not ordered (Update: Looks like they finally fixed this!)
- They need to be manually copied between every deployment slot
The first point is pretty easy to explain with a picture:
I’ve created these App Settings one after another, but as you can see they are persisted in the order they’ve been created, not in Alphabetic order. As you can imagine, this gets out of hand pretty quickly.
The second issue concerns App Settings + Deployment Slots. Right now, if you want to have the same settings in the production and the staging slot (as it should be the case, since the staging slot should behave exactly as the production slot), you have to manually keep the App Settings in sync between the slots. A majority of the issues in the Mimo backend have been caused by us setting a value in the production slot but forgetting to also set that exact same value in the staging slot. I’d love to see a feature that makes it possible to automatically keep certain values in sync between different slots.
Another surprise that recently got me is not strictly related to the app settings themselves, but similar to the syncing problem above: Apparently, extensions in App Service are per-slot and not per-service, which I assumed it was. This means you have to manually keep the extensions in sync between the slots, same as with the app settings.
A bunch of things in App Service are a bit unintuitive, which don’t make it feel like it’s a “fully-managed” platform.
We had to figure them out the hard way, here are some of the surprises that we found so far:
- The warm-up logic for auto-swap will does not redirect from HTTP to HTTPS, which basically every service today has implemented.
- Azure will hit your service with random requests to do some kind of security scans. The requests will fail though because Kestrel hosted on App Service will only accept requests coming directly from the IIS proxy.
- If you don’t enable the hidden Local Cache feature, your whole site will go down if there is a problem with the underlying storage account that hosts your site. Also, this isn’t something that is an edge case, Azure regularly does upgrades to the underlying storage infrastructure and in this time your instances will be down.
All in all Azure App Service is still an excellent service and without it, we couldn’t have grown that fast and iterated as rapidly as we would have using a non-managed platform. That being said, we’re also seeing the limits building a highly-available service on it and I hope the Azure App Service team focuses more on these kinds of scenarios in the future.