AWS vs Paperspace vs FloydHub : Choosing your cloud GPU partner

With deep learning making its mark on almost every industry today, the demand as well as interest for roles like “Data Scientist”, “ML/DL engineer”, “AI Scientist” etc. have seen an unprecedented increase. More and more students, fresh graduates and industry professionals are realizing the need to stay abreast of these emerging technologies and are taking up courses, certifications and jobs in these fields. Once you decide to jump into the domain, the first thing you need to get your hands on is high computing power. That’s where GPUs come in.

However, building your own deep learning rig is a pricey affair. Factor in costs of a fast and powerful GPU, CPU, SSD, compatible motherboard and power supply, air-conditioning bills, maintenance and damage to components. On top of it, you run the risk of falling behind on the latest hardware in this rapidly advancing industry.

Moreover, just assembling the components is not enough. You need to setup all the required libraries and compatible drivers before you can start training your first model. People still go along this route, and if you plan to use deep learning extensively (>150 hrs/mo), building your own deep learning workstation might be the right move.

A better and cheaper alternative is to use cloud-based GPU servers provided by the likes of Amazon, Google, Microsoft and others, especially if you are just breaking into this domain and plan to use the computing power for learning and experimenting. I have been using AWS, Paperspace and FloydHub for the past 4–5 months. Google Cloud Platform and Microsoft Azure were similar to AWS in their pricing and offerings, hence, I stuck to the previously mentioned three.

AWS : Most popular cloud service provider. Offers secure and scalable GPU instances along with additional AI integrations such as Polly, Rekognition, Lex and AWS Machine Learning (available in some regions).

Paperspace : Cloud VMs with GPU support for gaming, designing and programming (ML/DL) needs. Offers latest NVIDIA GPUs along with pre-installed packages and a few DL frameworks at competitive prices.

FloydHub : Marketed as “Heroku for DL”, Floyd promotes open-source collaboration by introducing public projects and datasets. Has its own CLI for training models using Caffe, PyTorch, Chainer, MxNet, TF, Keras and others.

Choose a p2.xlarge instance with elastic IP and 30GB EBS volume (part of Free Tier) on AWS, Ubuntu ML-in-a-box GPU+ VM with 50GB SSD on Paperspace and Base Data Scientist Plan without any Powerups on FloydHub.

The comparison between the three can be extensive with each offering unique benefits. However, I’ll keep it limited to six key aspects which would be most relevant to a beginner in this domain or someone who plans to use these platforms for small-scale hobby projects.

[UPDATE| May 2018] : This post is now more than 6 months old. In this era of ever-changing technology with hardware/software upgrades, any comparison between different technological platforms quickly become outdated. Thus, I’ve added snippets of UPDATE sections in relevant places of this post and one to summarize it all, at the end. However, the updates, by no means, should be considered exhaustive.

Ease of setup :

Setting up a fully-configured instance on AWS is difficult, inspite of having extensive setup tutorials on the web. Appropriate shell scripts need to be run to configure EBS volume, set up dedicated IPs and also install the required packages, software tools and DL libraries. Of course, you can use some of the freely available Deep Learning AMIs. Nonetheless, they still require a little bit of effort.

On the other hand, Paperspace and FloydHub pride itself in allowing its users to setup the instances within minutes. With FloydHub, you have to install a separate CLI. However, the instructions provided are pretty clear and once you login, you find yourself being welcomed to a host of different DL environments. Installing additional packages are not much difficult either. On Paperspace as well, you can run your instance within a few clicks, though some additional packages and frameworks might need manual installation for a complete experience.

User experience :

Uploading/downloading datasets is the biggest pain point while using cloud GPU services. With AWS, FileZilla Client can be used to transfer files. Using commands like curl and wget from the terminal do not always work and other open-source hacks have to be relied upon. AWS, however, does allow easy data download/upload for Kaggle competitions through kaggle-cli. Paperspace provides 1Gbps fiber internet and a web browser. Currently, it also provides a drag-and-drop feature for Windows machines (coming soon for Linux) to transfer files from your local machine to the VM directly. When using FloydHub, one has to download the dataset locally and then upload it to their account. The code and data have to be kept separately on your local system, as every time the script runs, the entire folder contents are uploaded.

Paperspace and FloydHub being new entrants on the block fall behind AWS in terms of open-source community support, availability of tutorials and video experiments. However, their official documentation and examples are quite comprehensive.

Things to note : Floyd CLI takes some time getting used to. Lots of processes are different than standard terminal or desktop-based usage. Hence, it is a good idea to religiously go through the FloydHub documentation and FAQs. If you are a Paperspace user far away from US (Eastern Europe/Asia), expect some latency while using the desktop environment.

Hardware/software offered :

AWS and FloydHub use Tesla K80 GPUs (12GB vRAM) and 61GB RAM, whereas Paperspace has options for Quadro M4000 (8GB vRAM), a couple from Pascal series (16–24GB vRAM) and even the latest Volta series, Tesla V100 (16GB vRAM), each with 30GB RAM. To give a rough estimate, the Pascal series GPUs are 3x faster than K80s, while the V100 is 6x faster than K80s. AWS and Paperspace also use SSD and dedicated GPU instances, whereas FloydHub offers a choice between pre-emptible and dedicated GPUs.

The usual way of running scripts on these services are through Jupyter notebooks or directly executing them on the terminal. Paperspace, by virtue of providing a desktop environment also allows IDEs like Spyder and other utility softwares. The presence of a Linux desktop is highly convenient.

[UPDATE | May 2018] : All three of them (AWS/Paperspace/FloydHub) have now upgraded themselves to NVIDIA Volta GPUs, thus making superfast training and inference possible now. In terms of software and frameworks, AWS has updated its Deep Learning AMI, which includes pre-installed frameworks like Chainer, TensorFlow, Keras, PyTorch. FloydHub already has the latest versions of all these frameworks.

Performance :

As a benchmarking exercise, I compared training of multiple models on all the three platforms under the same environment (Keras+Theano on Jupyter).
AWS - p2.xlarge (Tesla K80, 12GB vRAM, 61GB RAM)
Paperspace - GPU+ VM (Quadro M4000, 8GB vRAM, 30GB RAM)
FloydHub - Tesla K80, 12GB vRAM, 61GB (equivalent to Base plan)

Two models were trained — A deep CNN model with Dropout on Fashion MNIST dataset and a fine-tuned pre-trained VGG16 network on a grocery product image classification task. Their performance is depicted below.

Model performance on different platforms (smaller is better)

AWS p2.xlarge and Paperspace GPU+ have almost equivalent performance with AWS just inching ahead. If we use the Pascal versions on Paperspace, which are still cheaper than AWS, the model performance is expected to be 3x as fast as AWS. Despite using the same hardware, FloydHub is at ~0.75x of AWS, most probably due to slower disk reading speed.

[UPDATE | May 2018] : This is probably the most interesting update. On running the same experiments/scripts as mentioned above, recently, I found a huge improvement in the training time on FloydHub. Latest numbers show that they are at par with AWS or Paperspace GPU+ or even better. FloydHub seems to have fixed the I/O issues and having upgraded to the latest TensorFlow, Keras and PyTorch versions, it seems to have done wonders for this platform. The Fashion MNIST script now takes 8s/epoch while training, whereas the Pre-trained VGG16 script now takes much lesser (~100s/epoch). While I haven’t checked if Paperspace too has brought about some improvements on the same, AWS definitely hasn’t. So, for now, FloydHub emerges as the fastest among the three.

Additional features :

Both Paperspace and FloydHub offer custom plans for teams. However, Floyd’s associated features like centrally sharing datasets/projects, versioning of various job runs for easy reproducibility and support for and Udacity MOOCs aid collaboration and a conducive open-source atmosphere. Floyd also allows concurrent job runs. AWS offers multi-GPU instances, whereas FloydHub and Paperspace only support single GPU systems.

[UPDATE | May 2018] : While AWS has focused more on lateral applications favouring enterprise and production systems, Paperspace and FloydHub both have introduced a lot of new features to improve upon the ease of usage and ease of access of GPUs to the general public, at large. Some of these have been :
[FloydHub] : Slack integration, favouring usage across teams
[FloydHub] : Job management UI, metrics dashboard
[FloydHub] : A beta version of a new interactive environment (similar to a VM on cloud), called Workspace
[Paperspace] : Collaboration as the official partner for Jeremy Howard’s course
[Paperspace] : Paperspace Gradient and API, along with their own CLI, which are respectively tools to run your jobs efficiently on the cloud and a devkit to automate your VM/jobs (suitable for DevOps!)
With Workspace, Gradient and collaboration, FloydHub and Paperspace have moved closer to offering similar features.

Pricing :

Pricing is probably the most important selection criteria. Currently, billing is prorated on a per second basis for AWS and FloydHub and at millisecond granularity for Paperspace.

AWS GPU instances start at $0.9/hr with 30GB free EBS volume under the Free Tier program. A 100GB SSD volume+ elastic IP would cost an additional $13/month. AWS also provides spot instances which are much cheaper, but highly susceptible to price fluctuations and hence, not a reliable option.

Paperspace offers Maxwell series GPUs at $0.4/hr and Pascal GPUs from $0.65/hr. A 100GB SSD with public IP will cost $7/month. Additional utility services are also provided.

FloydHub recently moved away from a pay-as-you-go model to well-defined monthly plans. The Base Data Scientist plan costs $14/month for 10 GPU hours and 100GB storage. Additional pre-emptible GPU hours can be bought starting from $0.59/hr. A premium is charged for dedicated GPU instances.

[UPDATE | May 2018] : FloydHub’s pricing structure has changed significantly, while that of AWS and Paperspace remains almost the same. Paperspace still remains the most affordable option.

Deployment :

I didn’t try deploying a model on either of them. Floyd provides a one-line command to deploy your model as a REST API. AWS has a host of associated services to further improve your app deployment experience. Will update once I explore them.

Summarizing the key aspects in the table below.

Comparing key aspects of GPU-on-cloud service providers
[UPDATE | May 2018] : There are not much changes in the chart above, except the Performance section, where FloydHub is the fastest now. On the Hardware/Software front, all three are at a level pegging.

If you have used either of these services before, please share your experience. If you haven’t, go for it now. It would be nice to have your suggestions below.