Reducing the costs of machine learning experiments with Spot Instances and Auto Backups

Spell
2 min readJul 15, 2019

--

The increasing availability of powerful compute resources is one of the main drivers of the rise of machine learning applications. But the cost of computation is also one of the main reasons people hesitate to run too many experiments. Building machine learning models requires a lot of trial and error, and it can take time and money. This is where Spot Instances can help.

What are Spot Instances?

Spot Instances let you take advantage of unused EC2 capacity on the AWS cloud. Spot Instances are available at up to a 90% discount compared to On-Demand instance prices. Since Spot Instances take advantage of the spare capacity of AWS EC2, they don’t have the guaranteed availability of On-Demand instances, and could be terminated with a 2 minute notification. However, the cost savings outweigh these small disadvantages when you’re in the experimentation phase.

The Fast.ai team trained Imagenet to 93% accuracy in 18 minutes, using 16 public AWS Spot Instances, each with 8 NVIDIA V100 GPUs.

This set a new speed record for training ImageNet to this accuracy on publicly available infrastructure and the total cost to train the models on Spot Instances was $40.

Using Spot Instances on Spell with Auto Backups

Using AWS Spot Instances on your own can take time to set up and configure and has the potential to lose project data. Spell removes the complexity of setting up Spot Instances for your projects, users on the Teams plan can take advantage of spot instances for their projects with the click of a button.

Screenshot of the Spell Web Console

As part of the service, Spot Instances on Spell comes with Auto Backup, so if your Spot Instance happens to terminate, Auto Backup automatically detects your machine has gone down, takes the disk from that machine, creates a new CPU instance to attach the disk to, and saves all your progress. All your files are available in Spell just like a normal run, so you will never have to worry about losing your work.

Screenshot of the Spell CLI Auto Backup example

The Spell platform significantly reduces time to setup your infrastructure for machine learning projects, and with Spot Instances, it can reduce your experimentation costs up to 90%. More experimentation means more innovation. Get started with Spot Instances on Spell today.

Overview of Spell Spot Instances

--

--

Spell

MLOps platform to streamline machine learning. Sign up to our new MLOps newsletter at www.Spell.ml