Benchmarking Tensorflow Performance and Cost Across Different GPU Options
Machine learning practitioners— from students to professionals — understand the value of moving their work to GPUs . Without one, certain tasks simply become unfeasible for lack of computing power. However, the available options are confusing. Do I build my own? Rent one from a vendor? How much real-world performance and cost should I expect?
With that in mind, I benchmarked several GPU options against a realistic GPU-heavy workload, implemented in Tensorflow. I chose Amazon’s base GPU instance and GPU instances from Paperspace, an Initialized Capital portfolio company. I also added my own personal Nvidia GPU and my laptop CPU:
- Nvidia 1080 Ti (from my personal desktop)
- Amazon p2.xlarge instance ($0.90/hour — Nvidia Tesla K80)
- Paperspace P5000 ($0.60/hour — Nvidia Quadro P5000)
- Paperspace GPU+ ($0.40/hour — Nvidia Quadro M4000)
- Macbook Pro CPU (i5 2GHz)
The benchmark itself was relatively simple. I installed Tensorflow v1.0 on each machine, and fine-tuned the Inception v4 model from an existing checkpoint. Performance, expressed in minibatches per second, was monitored after each run was started (using this script).
Unsurprisingly, the results demonstrate how poorly suited CPUs are to compute-heavy machine learning tasks, even a relatively new Macbook Pro. Choosing any GPU, regardless of performance, improves performance by over an order of magnitude (43x for Amazon’s p2.xlarge, 167x for Nvidia’s 1080 Ti).
Interestingly, Amazon’s Tesla K80-based p2 instances are clearly showing their age in performance and cost. Paperspace’s base GPU+ is 15% more performant at less than half the cost while its P5000 clocks in at 3x the performance at two-thirds the cost.
Of course, the clear performance winner is Nvidia’s recently-released 1080 Ti GPU. However, you’d have to build your own machine and bear the fixed cost (roughly $2,500) of buying the components. It’s not for everybody, especially those who anticipate only occasional GPU usage.
Unlike non-GPU instances, users pay a steep premium for GPU compute time; it’s important, therefore, to get a good sense for what your costs will be for a typical task.
Building your own machine around a GPU is clearly the most economical choice for heavy users; costs are essentially fixed and can be amortized over the life of the machine. Adjusting for performance, the build cost of a DIY machine can be recouped in just under 30 days of continuous usage (assuming a build cost of ~$2,500).
For those who are reluctant to commit to building their own machine, Amazon is clearly no longer an economical option for single-GPU workloads. Not only are their GPUs old and underpowered, their costs are relatively high. Adjusting for cost, Paperspace’s GPU+ instance type is about 2.7x cheaper than Amazon, while its P5000 is a clear winner, both in performance and cost.
What Should I Get?
Here is my take on what people ought to get, depending on their use case:
- Heavy users: Go DIY. Not only will you reap the performance benefits of a newer GPU, heavy utilizers will quickly recoup their costs, especially if they were running on Amazon to begin with. Moreover, having a GPU available all the time means you can iterate much more quickly and not worry about shutting down your rental GPU, whether it’s on Paperspace, Amazon, or Google Cloud.
- Moderate or Sporadic Users: Paperspace is best. You’ll get your choice of two performance tiers, and the costs at all levels of performance are significantly lower than Amazon. As a nice bonus, you’ll get a desktop Linux environment you can access through your browser.
- Parallel Applications: While Amazon misses on both cost and performance, it does offer multiple GPUs (8 and 16 GPU instance types). If your application truly calls for multiple GPUs then Amazon could be a reasonable choice, though you’ll pay dearly ($14.40/hour for 16 GPUs). In most cases, adding additional 1080 Ti’s should be a more economical choice due to its modest price point ($699) and much higher performance — a dual 1080 Ti GPU should be able to roughly match the performance of an 8 GPU Amazon p2.8xlarge.
Update (4/21/2017): Folks on r/MachineLearning have pointed out that Amazon spot instances have been running around $0.20/hour, which changes the economics of running on Amazon. While this price seems to be a relatively recent development, it may be worth exploring for your application.
Building something interesting? Initialized Capital would love to chat with you.