Running Windows on Google Cloud

Published in

Travix Engineering

4 min readJan 15, 2018

At Travix — an Online Travel Agency — we have quite a lot of ‘legacy’ applications besides the newer applications we run in Google Kubernetes Engine. Most of those older applications have been written in C# and still require Microsoft Windows to run. We used to run them in a colocation data center on a fixed number of servers. But recently we have migrated the majority of them into Google Cloud.

Immutable images & Managed Instance Groups

In order to make it easy to run the applications in autoscaling managed instance groups we bake full vm images for every application revision using Packer and Boxstarter. This works reasonably well, although the bake and deploy process is a tad slow. Baking an image takes about 7 minutes, deploying the application about 5 minutes per vm.

However the benefits are quite large. There’s no need for configuration management for which you need to ensure it works both for fresh machines as existing ones. With the ‘immutable’ approach of baking a golden image we only need to be able to bake a vm image from scratch.

In order to bake once and run in all environments we’ve split the bake time steps and the runtime steps into 2 separate Powershell files. The packerfile.ps1 executes the bake time steps, the entrypoint.ps1 file uses Google Cloud metadata to initialize the application when its vms start up.

We use this process for all Windows applications we run in the cloud. The biggest drawbacks are the speed, the Windows license costs and the inability to bin pack multiple applications on less hosts saving us even more money for running them. However the benefits easily outweigh them.

Autoscaling

To make running applications in the cloud economically feasible you want to make use of one of the added benefits of cloud itself, autoscaling. If we would run applications at the maximum number of required resources needed to serve our traffic peaks it would cost us an arm and a leg, but when resorting to autoscaling it becomes much more interesting.

If you take a look at a typical 24 hour request rate of one of our applications you can see there’s a big opportunity to save money on the required number of vms required to serve the traffic during the night.

Request rate during a 24 hour period for one of our Windows applications

After enabling autoscaling based on cpu we managed to get the number of vms to follow the request rate reasonably well, resulting in a lot of saved costs.

The number of Windows vms running one of our Windows applications

Migrating

We didn’t get there by just moving the applications over to the cloud as is. For us it did mean that applications that used to run for a week without restarts suddenly had fresh starts throughout the day whenever a new vm got added. It turned out a lot of internal caches in several of our applications didn’t get warmed up properly at startup time, which led to a lot of high response time peaks whenever a new vm started.

After tackling the warmup of those applications we got pretty stable response time graphs (given an incidental peak here and there).

Median, 95th and 99th percentile response times of one of our Windows applications.

It also means we can do deployments any time of day without impacting performance. All in all the absolute requirement to make sure fresh machines don’t disrupt performance led to a much more stable platform.

Recovery after disruption

One of the things that has caused us a lot of headaches is the slow startup time of the Windows vms combined with our application warmup. Especially the latter leads to total startup times of 10 minutes or more. This makes it hard for any autoscaling algorithm to respond quickly enough after a sudden surge in traffic.

This was particularly troublesome any time one of our applications started failing and using far less cpu because of it. This led to the application scaling down and then taking far too long to recover when the errors went away.

To make our platform more stable and prevent this negative side-effect of cpu based scaling we wrote and run a custom scaler, which takes a Prometheus query as source for the number of requests that are coming into the platform. With some constants — different for all of the applications in the call stack — we calculate and set the minimum number of instances required for each, to prevent the Google autoscaler to drop the size of an application when there’s any form of disruption. This essentially takes over the lower bound of the autoscaler algorithm while still letting Google’s autoscaler respond to sudden increases in cpu usage.

What’s next?

To improve the speed of baking and starting an application we’re working on containerizing our Windows applications to be able roll them out without restarting the host vms themselves speeding up the process. Tackling the containerization now prepares us for running them in Kubernetes in the future.