What to use for Deep Learning: Cloud Services vs GPU

Akhil Vasvani
6 min readAug 3, 2019

--

So you’re interested in Deep Learning? You’ve learned the theory, played around with Python libraries of Pandas, NumPy, Scikit-learn, SciPy but now you want to get into the nitty-gritty and train your own neural network or work with others. The question you are probably asking yourself: where do I begin?

There are three (technically two, but I like to think of it as three) main avenues to running Deep Learning networks: Google Colab, build your own Deep Learning computer, or use other Cloud Computing services. I’ll go through the pros and cons of each option and summarize everything in a table at the end.

Google Colab

Google Colab or “the Colaboratory” is a free cloud service hosted by Google to encourage Machine Learning and Artificial Intelligence research, where often the bridge to learning and success is the use of tremendous computational power.

Benefits

Besides being easy to use, the Colab is fairly flexible in its configuration and does much of the heavy lifting for you.

  • Python 2.7 and Python 3.6 support
  • Free GPU acceleration (NVIDIA Tesla K80) as well as Google’s Tensor Processing Unit (TPU)
  • Pre-installed libraries: All major Python libraries like TensorFlow, PyTorch, Scikit-learn, Matplotlib among many others are pre-installed and ready to be imported.
  • Built on top of Jupyter Notebook
  • Collaboration feature (works with a team just like Google Docs): Google Colab allows developers to use and share Jupyter notebook among each other without having to download, install, or run anything other than a browser.
  • Supports bash commands
  • Google Colab notebooks are stored on the drive
  • Run both .py (python files) and as well as exisiting .ipynb (jupyter notebook) files

If you prefer to read more before getting started, I recommend the Google Colab FAQ, Google Colab Documentation and Code Snippets, and advice from the helpful community of users on Stack Overflow.

Cons

While the benefits are very useful and extremely easy to use, there are some downsides.

  • You can run your code for 24 hrs without interruptions but not more than that, 12 hours if you connect to a GPU-based VM runtime. It is designed to be interactive. Google states: “Long-running background computations, particularly on GPUs, may be stopped. We encourage users who wish to run continuous or long-running computations through Colaboratory’s UI to use a local runtime.”
  • Note: if you close the browser (wherever Colab is running), the GPU will stop running.
  • You need to install all specific libraries which do not come with standard python (Need to repeat this with every session)
  • Google Drive is your source and target for Storage, there are other like local (which eats your bandwidth if dataset is big)
  • Google provided the code to connect and use with google drive but that will not work with lots of other data format
  • Google Storage is used with current session, so if you have downloaded some files and want to use it later, better save it before closing the session.
  • Difficult to work with BIGGER datasets as you have to download and store them in Google drive (15 GB Free space with gmail id, additional required a payment towards google).
  • As I mentioned before, one main benefit is the free GPU access but if you observe closely at the picture below you will see something different.
In the picture above, the Tesla K80 only outperforms two other GPUs (Tesla GRID K520 and Radeon RX 560) in TensorFlow

Custom Deep Learning Computer with GPU

There are many ways to build your own deep learning computer. Before starting though, set a budget. I opted to build an expandable Deep Learning computer with a top-end GPU following Jeff Chen’s post. While it took almost 7 hours to assemble and a few more hours to install the NVIDIA drivers, CUDA, and cuDNN correctly, I love my computer and I’d highly recommend building your own computer.

Benefits

Cost comparisons for building your own computer versus renting from AWS. 1 GPU builds are 4–10x cheaper and 4 GPU builds are 9–21x cheaper, depending on utilization. AWS pricing includes discounts for full year and 3 year leases (35%, 60%). Power consumption assumed at $0.20 / kWh, and 1 GPU machine consumes 1 kW / h and 4 GPU machine consumes 2 kW / h. Depreciation is conservatively estimated at linear w/ full depletion in 3 years. Additional GPUs at $700 each, before tax.
  • Your $700 Nvidia 1080 Ti performs at 90% speed compared to the cloud Nvidia V100 GPU (which uses next gen Volta tech). Compared with renting a NVIDIA K80 online (cheaper at $1 / hour), your 1080 Ti blows it out of the water, performing 4x faster in training speed. Jeff Chen validated that it’s 4x faster in his own benchmark here. K80 is 12GB per GPU, which is a tiny advantage to your 11GB 1080 Ti.

Cons

  • High, upfront cost. In the short term, if you get the high-end parts for the deep learning computer it’s going to hit your wallet.
  • Slower download speed to your machine because it’s not on the backbone, static IP is required to access it away from your house

Other Cloud Computing Services

While Google Colab is is a free cloud service hosted by Google, there are multiple other popular services like AWS Deep Learning AMIs, GCP Deep Learning VM Images, Azure, Paperspace Gradient°, FloydHub Workspace, Lambda GPU Cloud, and Spell used for Deep Learning.

Here’s a full list of Deep Learning Cloud Service Providers. Each service comes with its own pros and cons and Dipanjan Sarkar’s blog post “Build your own Robust Deep Learning Environment in Minutes” outlines each service extremely well.

Benefits

  • Set up for deep learning models with new hardware can be extremely tedious and time-consuming for custom deep learning computers. You have to install the NVIDIA drivers, CUDA, cuDNN, and if you are feeling bold TensorRT, each time you build a computer. Then you have to install either TensorFlow (either from pip or build it from scratch), PyTorch, Caffe2, Chainer, MxNet, CNTK, or any other Deep Learning frameworks. With Cloud computing services, some already have pre-installed these libraries and frameworks, so you just have to register with the service and can get to work immediately.
  • If you are running a company which needs a lot of compute power, Amazon discounts pricing if you have a multi-year contract, so the advantage is 4–6x for multi-year contracts.
  • If you are training models occasionally, there is no large start-up cost and instead you can pay just for usage.

Cons

  • If you forgot to shut off the network when it’s not in use, then you will get charged. This happened to me one time: during graduate school, I left my network running while I went to go get groceries. I got to the counter and I tried to pay, but my card got declined. I called my bank and found out that AWS had maxed out my card. Luckily, I spoke with an Amazon representative who understood my ignorance and refunded me, but I was always aware after that incident.

Conclusion

Machine learning practitioners — from students to professionals — understand the value of moving their work to GPUs . Without one, certain tasks simply become infeasible for lack of computing power. While these three avenues are all different in pricing, the question you need to ask yourself is how often are going to be using Deep Learning networks.

  • Student: If you are a student who wants to get experience, I would suggest using Google Colab as a starting point. Then see if you can get onto a project with a professor or advisor using a GPU or AWS service or find a class where they are using AWS (for deep learning) to expand your horizons.
  • Moderate or Sporadic Users: Paperspace is best. You’ll get your choice of two performance tiers, and the costs at all levels of performance are significantly lower than Amazon. That said, Amazon spot instances have been running around $0.20/hour, which changes the economics of running on Amazon. While this price seems to be a relatively recent development, it may be worth exploring for your application. As a nice bonus, you’ll get a desktop Linux environment you can access through your browser.
  • Heavy-users: Build your own computer. Not only will you reap the performance benefits of a newer GPU, heavy utilizes will quickly recoup their costs, especially if they were running on Amazon to begin with. Moreover, having a GPU available all the time means you can iterate much more quickly and not worry about shutting down your rental GPU, whether it’s on Paperspace, Amazon, or Google Cloud.
  • Professionals: If you are a professional using Deep Learning almost all day, I’d recommend using AWS or any other cloud-computing software. Amazon discounts pricing if you have a multi-year contract, so the advantage is 4–6x for multi-year contracts. So it’s easy to scale up as your datasets expand, gives you access to not only multiple fast GPUs (if you pay the high price), and does not slow down the bandwidth.

References

How to use Colab

How Google has crushed it with Colaboratory

Which GPU is better for Deep Learning, GTX 1080 or Tesla K80?

Why your personal Deep Learning Computer can be faster than AWS and GCP

--

--