Saving money with Preemptive VMs

3 min readDec 1, 2017

SkyEYE is a research organization whose focused on globally assessing the health of the planet through processing digital imagery from satellites.

They came to me with a massive problem : As a seed funded startup, they were having trouble keeping their core-hours down, since processing of a petabyte of image data on a regular basis was a huge undertaking, and was a lot of costs.

Their Design

Their system satellite imagery and pushes the image information to cloud storage while pushing the metadata to datastore for retrieval later.

They also have a notification system setup when images are uploaded to the GCS bucket, which creates a pub/sub message containing details about where to find the image.

This pub/sub message is then polled by an autoscaling instance group, which grabs the image, datastore information, and starts processing the image data. All their results are pushed to bigquery, where they can search, sort, and analyze results.

When looking at costs, I tend to start by looking at boot-time for the VMs in the instance groups. Any time this is high, you’ll find scaling issues somewhere. Turns out one of their engineers had already seen our previous articles on the topic, so startup time wasn’t a problem for them.

Following that, we tried to take a look at their operations. Quite frankly their code was highly tuned and properly optimized. There wasn’t any cost/ performance leakage here, it was just a case that they were doing an epic done of work.

Preemptible VMs to the rescue

Preemptible VMs are the same as regular instances except for one key difference — Compute Engine might terminate (preempt) these instances if it requires access to those resources for other tasks, which can happen at any time.

I know what you’re thinking; that sounds mega-disruptive. But they have one upside : By not guaranteeing indefinite uptime, Google is able to offer PreEmptive VMs at a substantial discount to normal instances; Right now, the prices as low as $0.01 per core hour.

With this in mind, the better thing to ask is, “What part of our cloud applications could be fault-tolerant, and withstand possible instance preemptions, such that we can get the discount?

For example, batch processing jobs can run on preemptible instances. If some of those instances terminate during processing, the job slows but does not completely stop. Preemptible instances complete your batch processing tasks without placing additional workload on your existing instances, and without requiring you to pay full price for additional normal instances.

The simple switch

To make your VM run in a preemptive mode, is pretty simple.

When creating your Instance Template, or the VM itself, simply expand the “Preemptibility” drop down, and se it to ON. (likewise, if you’re using command lines, there’s a flag for that too)

AAaaaaannnddd. That’s it.

Dealing with shutdowns

The Trick with PVMs is that they can be shut down at any time for any reason. Thankfully, your VM will get a 30 second notification before this occurs, giving your app time to respond and clear local computational information before things get lost.

For the design that SkyEye was doing, this was really simple : Since they were polling a pub/sub stream, they just didn’t acknowledge the receipt, which let another VM in the group pick up that work to do on their behalf.

Saving coin.

SkyEYE shifted over, and recently completed a massive experiment using almost 30,000 CPUs to process 1 petabyte of NASA imagery in just 16 hours. All using Preemptive VMs , and saving some serious coin.