Why building your own Deep Learning Computer is 10x cheaper than AWS

Jeff Chen
Jeff Chen
Sep 24, 2018 · 8 min read

Updated 12/11/2019

Gorgeous interiors of your Deep Learning Computer

If you’ve used, or are considering, AWS/Azure/GCloud for Machine Learning, you know how crazy expensive GPU time is. And turning machines on and off is a major disruption to your workflow. There’s a better way. Just build your own Deep Learning Computer. It’s 10x cheaper and also easier to use. Let’s take a closer look below.

This is part 1 of 3 in the Deep Learning Computer Series. Part 2 is ‘How to build the perfect one’ and Part 3 is ‘Performance and benchmarks’. See new photos and updates: Follow me on Medium, Twitter, and Instagram! Leave thoughts and questions in comments below.

Building an expandable Deep Learning Computer w/ 1 top-end GPU only costs $3k

$3K of computer parts before tax. You’ll be able to drop the price to about $2k by using cheaper components, which is covered in the next post.

Building is 10x cheaper than renting on AWS / EC2 and is just as performant

Cost comparisons for building your own computer versus renting from AWS. 1 GPU builds are 4–10x cheaper and 4 GPU builds are 9–21x cheaper, depending on utilization. AWS pricing includes discounts for full year and 3 year leases (35%, 60%). Power consumption assumed at $0.20 / kWh, and 1 GPU machine consumes 1 kW / h and 4 GPU machine consumes 2 kW / h. Depreciation is conservatively estimated at linear w/ full depletion in 3 years. Additional GPUs at $700 each, before tax.

There are some draw backs, such as slower download speed to your machine because it’s not on the backbone, static IP is required to access it away from your house, you may want to refresh the GPUs in a couple of years, but the cost savings is so ridiculous it’s still worth it.

If you’re thinking of using the 2080 Ti for your Deep Learning Computer, it’s $600 more and still 4-9x cheaper for a 1 GPU machine. The Titan RTX is $1,800 more, but it’s up to 2.3x faster with more than double the memory than the 1080 Ti — though you can only fit one Titan RTX because they do not come with blower fans. My current setup has one Titan RTX in the bottom slot and 3 other cards on top.

Cloud GPU machines are expensive at $3 / hour and you have to pay even when you’re not using the machine.

Even when you shut your machine down, you still have to pay storage for the machine at $0.10 per GB per month, so I got charged a hundred dollars / month just to keep my data around.

You’ll break even in just a few months

Your GPU Performance is on par with AWS

You get more memory with the V100, 16GB vs. 11GB, but if you just make your batch sizes a little smaller and your models more efficient, you’ll do fine with 11GB.

Compared with renting a last generation Nvidia K80 online (cheaper at $1 / hour), your 1080 Ti blows it out of the water, performing 4x faster in training speed. I validated that it’s 4x faster in my own benchmark here. K80 is 12GB per GPU, which is a tiny advantage to your 11GB 1080 Ti.

Nvidia’s new RTX cards are even faster: the 2080 Ti is 1.4x faster and the Titan RTX is 1.6x faster with 2x more memory than the 1080 Ti. If you’re doing training in half-precision, the RTX cards are 1.6x and 2.2x faster, respectively. These RTX cards easily outperform the cloud.

AWS is expensive because Amazon is forced to use a much more expensive GPU

Building is better than buying

It’s not necessary to buy one. You see, the hard part about building is finding the right parts for machine learning and making sure they all work together, which I’ve done for you! Physically building the computer is not hard, a first-timer can do it in less than 6 hours, a pro in less than 1 hour.

Building lets you take advantage of crazy price drops

Building lets you pick parts so your computer can expand to 4 GPUs and optimize it in other ways.

You can also make sure the design aesthetic is awesome (I personally find some of the common computer cases hideously ugly), the noise profile is low (some gold rated power supplies are very loud), and the parts make sense for Machine Learning (SATA3 SSD is 600MB/sec while M.2 PCIe SSD is a whopping 5x faster at 3.4GB/sec).

How to start your build

See new photos and updates: Follow me on Medium and Twitter!

FAQ

Will you help me build one?
Happy to help with questions via comments / email. I also run the www.HomebrewAIClub.com, some of our members may be interested in helping.

What models can I train?
You can train any model provided you have data, GPUs are most useful for Deep Neural Nets such as CNNs, RNNs, LSTMs, GANs. Some examples w/ code & datasets are listed on my website thisisjeffchen.com.

Vision and photo enhancement is really good now, which makes the new iPhone 11 amazing.

How does my computer compare to Nvidia’s $49,000 Personal AI Supercomputer?
Nvidia’s Personal AI Supercomputer uses 4 GPUs (Tesla V100), a 20 core CPU, and 128GB ram. I don’t have one so I don’t know for sure, but latest benchmarks show 25–80% speed improvement. Nvidia’s own benchmark quotes 4x faster, but you can bet their benchmark uses all V100’s unique advantages such as half-precision and won’t materialize in practice. Remember your machine only costs $4.5k with 4 GPUs, so laugh your way to the bank.

How can I learn Artificial Intelligence?
Stanford is giving away a lot of their CS curriculum. So look there.

I got a lot of help from other articles while researching the build, if you’re interested in reading further I’ve listed them here: Michael Reibel Boesen’s post, Gokkulnath T S’s post, Yusaku Sako’s post, Tim Dettmer’s blog, Vincent Chu’s post, Puget System’s PCIe 16x vs. 8x post, QuantStart’s rent vs. buy analysis, Tom’s Hardware’s article.

Thank you to my friends Evan Darke, Nick Guo, James Zhang, Khayla Sill, and Imogen Grönninger for reading drafts of this.

Mission.org

A network of business & tech podcasts designed to…

Sign up for Mission Daily

By Mission.org

Mission Daily Newsletter Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Mission.org

A network of business & tech podcasts designed to accelerate learning.

Jeff Chen

Written by

Jeff Chen

AI engineer and company builder. Founded Joyride (acquired by Google). Current projects: thisisjeffchen.com

Mission.org

A network of business & tech podcasts designed to accelerate learning.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store