GPU servers for machine learning startups: Cloud vs On-premise?

At Corti it was never a question whether we should get on-premise GPU servers due to our applications inherent data privacy concerns. But for other machine learning startups the question still remains: Should you go Cloud or On-Premise for your training hardware? Finally, this post will also help you get started with your first GPU server.

Disclaimer: This post is to be seen as advice only. If you followed my guide and either burned a lot of money, components or both I cannot be held at fault.


Do you even need GPU servers?

Yes. GPUs are awesome calculation machines. They have green neon lights and sound like jet engines!

Ok maybe let’s dive a bit deeper into that question before we get our geek on. This one is simple to answer: If you’re training deep neural networks deeper than a couple of layers then yes. You need them. Even a 5-10x speedup on 1h CPU training job is going to save you tons of time in the end — even more so if we’re talking weeks of CPU training.

Now we got that out of the way let’s look at cloud vs on-premise servers. We’re going to compare the two types based on the primary components that we at Corti feel matter:

  1. Performance
  2. Cost
  3. Operations

Finally if you decide to go on-premise I will finish the post off by giving a few recommendations based on our setup.

Cloud vs on-premise: Performance

Tim Dettmers has done a couple of very detailed posts on comparing the different consumer-grade GPU types you can get (Post 1, Post 2). When you’re done reading this post take a look at those — they’re awesome.

Nvidia Tesla K80 card

When we initially benchmarked cloud-GPUs I was actually surprised to discover that in general cloud-based GPUs are very slow. When we started 1.5 years ago the only option for GPU servers was to rent Amazon G2 instances featuring Nvidia Grid K1 or K2 GPUs. Back then they still provided a nice 2–3x speedup going from CPU to GPU on our early Kaldi-based speech recognition models. Now it seems that all commercial GPU datacenters offer Nvidia Tesla GPUs typically K40 or K80. If you were to buy one of these cards it would set you back about 5K$ — so if you live by the “if it’s expensive it must be good”-rule you would figure they would be awesome for deep learning, right?

Well.. They’re not. A Titan X Pascal which you can get for 1K$ will beat a Tesla K40/K80 any day! What’s the deal with that?

Titan’s beat Teslas any day of the week.

Tesla’s are double precision and Titan’s are single precision. This means that if you need those extra decimals to calculate the precise orbit of an asteroid you would go with Tesla’s but if you need to calculate one out of 10M weights in a neural network Titan’s with their single precision will do the job just fine.

To give a concrete example one of our larger speech recognition models would achieve 15seconds training time pr. batch for batchsize of 64 on our Titans vs a training time of 45 seconds pr batch for batchsize 32 (half the size because of smaller memory) on a Tesla K80. Which means more than 6x speedup (3x on time and 2x that due to the larger batch sizes) on Titan’s compared to Teslas.

Titan: 15 seconds pr. batchsize 64 
Tesla: 45 seconds pr. batchsize 32

Based on our experience there’s no doubt. On-Premise Titan-based GPU servers are way faster than cloud Tesla-based GPU servers.

On-premise: 1. Cloud: 0.

Cloud vs on-premise: Cost

Now for the cost. One of our GPU servers costs approximately 8.5K$ which includes 4x Titan X Pascals. An Amazon P2.xlarge instance with 1x Tesla K80 costs 0.9$/hr. To compare on a GPU-basis let’s bump that up to 3.6$/hr to get 4x Tesla K80. So now we have two setups: 1x GPU server with 4x Titan X Pascals and 4x Amazon GPU P2.xlarge instances totalling 4x Tesla K80.

This means that I could get 2361 computing hours (or approximately 100days of training) on the P2 instances before I reach the price of my on-premise GPU server. After that I’m just paying more money for the Cloud server. Note that this calculation is done without the power costs for running the servers in-house. In Denmark 1KWh can be bought for 0.3$ which is what the server approximately uses, which is a lot lower than the 3.6$/hr price on the Amazon P2 instances.

After <100 days of training your on-premise GPU server will be cheaper.

But let’s not forget that the Tesla K80s are 6x slower than the Titan’s. This means that I need to wait 6x longer for the K80s to finish training the same job as the Titans. Let’s take an example to show what this means in terms of cost. Let’s say that we need to train a job that would take one week (168hours) to train on the on-premise server. Because the K80s have 6x lower computing efficiency this would take the K80s 1008 hours (168*6) resulting in a cost of 3.6K$ or almost half the price of my on-premise server. So not only will I have almost paid for half an on-premise GPU server I would also have to wait 1.5 months for my job to finish instead of just one week!

On-premise: 2. Cloud: 0

Cloud vs on-premise: Operations

Ok spoiler alert: Operations is the only area where Cloud — not surprisingly — wins. It is difficult to compete with one-click launches when your machine learning team complains sunday morning that the GPU servers are unreachable on the VPN. Having your own servers at your office inherently comes with more operational issues:

  • Do we have the right drivers installed?
  • Coordinating updates
  • Network breakdowns
  • Wifi connectivity problems
  • Unscheduled reboots
  • Power outages
  • Equipment failure
  • Why doesn’t it respond?
  • No more disk space
  • Cable nightmares
  • etc..

If you’ve read my recent blog posts on tech risk using GPU servers does indeed introduce more variability in your development process. If you’re using Amazon’s GPU servers you don’t need to worry about unscheduled reboots because Ubuntu decided to perform automatic updates, thereby killing all jobs on the machine. The only operational aspect that favors the on-premise servers is customization. The on-premise servers are yours which means you can do with them as you want. Need a new version of Nvidia drivers on a cloud server? Well boohoo.

Still, Cloud wins this one.

On-premise: 2. Cloud: 1

Conclusion

Despite the operational challenges with running your own servers he performance and costs alone are more than enough to convince me that on-premise GPU servers is the right way to go. If it’s your job to make sure the machine learning team stays effective you would want to keep training times as low as possible and I would say that in the 1.5 years we’ve been running on-premise GPU servers we might have had a total of 1–2 days downtime. Compared to the 6x speedup and the 1000s of $ we have saved I would pick GPU servers any day.

So how do you get started?

Server. Sounds complicated? But really it’s just a big desktop computer — if you’ve assembled your own desktop PC you can build your own GPU server. It’s no more complicated than that. Here are the specs for our most simple to build GPU server (based on the Nvidia Digits Devbox design) which we’re using a couple of variants of today:

Motherboard: ASUS X99-E WS SSI CEB
RAM: 128GB DDR4 Corsair Dominator
CPU: Intel Core i7–5930K (LGA2011-v3)
CPU Cooling: Corsair Cooling Hydro Series H60
GPUs: 4x Nvidia Titan X (Pascal)
HD: Samsung 850 EVO Basic MZ-75E2T0 2TB
Power supply: Enermax Platimax EPM1500EGT 1500Watt
Case: Corsair Air 540 black

This is what the finished server looks like:

Finished Twenty-style (our 1st gen.) GPU server.

You can modify this spec almost how you want. However, there are a few things you should keep in mind if you do so:

1. Pick a motherboard that can fit 4x Titan X cards

Titan X’s are big cards. They occupy two regular PCI slots each and they require PCI-E. This means that you need to pick a motherboard that (1) has 4 PCI-E slots AND which has at least 1 PCI slot-size in between each PCI-E slot, (2) supports 4-way PCI-E 16x for maximum speed and (3) there should be enough space below the last PCI-E and underneath the full length of the GPUs. (1) and (2) are a spec thing — check that the board you have in mind says something like that. (3) is almost impossible to know, but there’s a few things you can do.

Titan X mounted on Asus X99E-WS. It occupies almost the entire length of the motherboard as well as two PCI slots in height.

Look at the motherboard layout and see if there are any components that look like they’re protruding too much upwards. Two examples of motherboards and how they would fit the Titan X’s:

The Asrock Z170 Extreme 7+. This one has 4x PCI-E 16x, but they are too close to each other. Look at the two middle ports. You can’t fit two titans there. This won’t work.
Our Motherboard of choice: Asus X99E-WS. This one has 8x PCI-E. Although you’re only going to use 4 of them there’s plenty of space to fit 4x Titan X in every other PCI-E port.

2. SSI-CEB vs ATX and cases

Related to size you might also have noticed that the motherboard has a SSI-CEB form factor. I have not yet been able to find any ATX motherboards that fit 4x Titan X’s. The SSI-CEB is slightly larger in one direction than an ATX, but the screws are in the exact same locations as an ATX motherboard.

This means that you might be able to fit an SSI-CEB motherboard in an ATX case as long as there is room enough on one of the sides in the case. Actually the Corsair case in our builds are ATX cases but bit the SSI-CEB fine.

Mounting an SSI-CEB on a motherboard. Here an example from our flagship “Ringhorn” server. On this server the motherboard was mounted such that the SSI-CEB board protruded outwards to the right. But as there was enough space to the right of a regular ATX motherboard we could fit it with no problem.

3. Power supply

The 4x Titans running on full speed are a bit hungry. So make sure you buy a large enough power supply. 1500W should do it.

4. Get plenty of RAM

If you’re anything like us you’re going to need a lot of RAM. The X99E-WS supports up to 128 GB RAM so use it.

5. Hard drive

SSD — goes without saying. We opted for 2 TB Samsung that supports hardware-based AES256 encryption.

6. CPU cooling

We opted for a pre-filled water cooling solution for our CPU mostly in order not to take up any more space because let’s face it: it’s not the CPU fan that’s going to make the most noise here.

7. GPUs: Titan X vs Titan X

Make sure to get the newest Titan X cards if you’re going with Titans (or check Tim Dettmers posts mentioned earlier). The newest Titan X’s are the one’s I link to above and are based on the Pascal architecture. For our builds they’re at least 4–5x faster than the older Titan X.

Putting it together

Assembling it is pretty easy. Just put the components together as you would if you were building a standard desktop computer. There’s really not a lot to it. The RAM in our spec comes with their own cooling fans as well. However, we have never been able to fit them in the chassis so we have been running without them since we built it. The first server you will build is always the most difficult — we have three now and we built the last one in less than half a day. When you know the case, the motherboard and the different components everything becomes much easier.

Final view of our latest addition to our GPU servers: “Idunn” (Norse goddess of youth)

A few quirks to be aware of:

  • The lowest mounted GPU will be on top of a bunch of connectors that go to the front panel for turning on the server etc. If you use the included adapter to mount them correctly instead of just the regular small GPIO wires I’m not sure you would be able to fit it on. Just stick the wires in their right place following the manual and stick the GPU on top.
  • Be sure to check the airflow (i.e. the direction) that the fans spins. You would like the main fans to draw air in at one end and blow it out in the other end. Move fans around if you have to. In our case we mounted the CPU cooling fan at the top of the chassis.
  • We haven’t been able to fit more than one RAM cooling fan due to other wires and tubes. But our server has run fine without for more than a year now.
  • Be sure to check that all fans are spinning once you boot it for the first time.
  • The GPUs can be very hard to get out once they’re mounted. If you need to remove the middle one you probably can’t without moving all of them from top to bottom. The problem is that it’s difficult to hit the release mechanism on the PCI-E port because there’s so little space.
  • If you need the servers to access wifi be sure to by a good USB wifi adapter as the motherboard doesn’t have wifi built in.

Tips & tricks

A brief list of tips and tricks that can help you get started a bit more painlessly:

  • Turn off automatic updates if you’re using a standard Ubuntu installation in order to not have your training interrupted
  • Buy a switch and let your machine learning team access the server via the switch. If you don’t you’ll hear them complain about moving large datasets over wifi :)… And don’t forget to connect the switches to the Internet as well.
  • Make sure to save your trained models during training as often as possible. With on-premise GPU servers you’re bound to lose computation time due to random mishaps such as power outages, cleaning pulling out wires, cleaning putting their vacuum cleaner on the same power outlet hence killing power in your office… Just save yourself a lot of trouble and tell your cleaning to stay away from the servers.
  • Routinely power down the servers and check for problems. In one case we noticed that the CPU fan for one of our server was not turned on!
  • Do coordinated updates: Software updates will always cause you some degree of trouble. Do yourself the favor of doing it on one server first to check the impact and once problems has been fixed immediately do it on all the other servers so they run the same versions of everything.
  • Remember to setup the firewall using iptables or similar.

Getting cocky: Introducing Ringhorn

For our second GPU server we decided to up the ante a bit and decided on a more custom and less “boring” build than the first one based around the Thermaltake Core P5 case. The finished server can be seen below.

Boot-up (without GPUs)
Ringhorn in the dark
Finished Ringhorn GPU build.

Building Ringhorn (named after the Viking boat of Love: Hringhorni) was definitely not for the faint of heart. In fact on May 3 2018 I decided to remove the description of how to build one from this blog post as I can no longer recommend this spec. It’s simply too high maintenance and we have even cooked a GPU because of leak once.

From left to right: Twenty (1st gen.), Ringhorn (2nd gen.) and Idunn (3rd gen.).

I hope this post helped you get started with building your own GPU server and make the right decision regarding whether cloud or on-premise is for you.

To finish off let’s do a short unstructured FAQ.

FAQ

We’re a young AI startup that don’t have the money to get started with an on-premise GPU server, what should we do?

The best performance- and cost-wise alternative to on-premise GPU servers areto skip Google and Amazon as GPU-server provider and go with Nimbix. This relatively unknown cloud provider has a wide array of GPUs available also Titan X’s and they’re also quite cheap.

I want a GPU server will you help me build one?

I would be happy to help with simple questions via e-mail or Twitter but I have a startup to run.

I’m using linear regression and happy with it. Do I need a GPU server?

No.

I’ve just installed the latest version of the Nvidia drivers and now my monitor won’t come on. What can I do?

Check lightdm is running. If not start it. (google it).

I’ve assembled the server but it won’t boot for some reason. How can I debug it?

If you’re using the same motherboard as us check below the lowest Titan card in your server and you should see a simple 8-section LCD display. Check which code it displays when it’s done flicking around. Check that code in the motherboard manual. If it’s says it has booted into the OS check the lightdm problem above. If you have small blue lights that are turned on constantly on your motherboard check their meaning. They usually signal either RAM, GPU, CPU or Power problems.


Do you run your own GPU servers? I would be very interested in learning about any updates you’ve made!

Thanks to our expert computer mod’er and machine learning engineer Alexander Wahl-Rasmussen for helping us build and support our servers.

This post was updated on May 3 2018 where I removed the details of how to build Ringhorn as I can no longer recommend this spec.

Thanks for reading my post. If you want a more updated version that contains more newer specs and way more details you should check out Jeff Chen excellent series on the same topic.