Titan V vs 1080 Ti — Head-to-head battle of the best desktop GPUs on CNNs. Is Titan V worth it? 110 TFLOPS! no brainer, right?
NVIDIA’s Titan V is the latest “desktop” GPU built upon the Volta architecture boasting 110 “deep learning” TFLOPS in the spec sheet. That’s an incredible number. Compare that with 1080 Ti, the current king for the “desktop” GPU, that puts out 11 “normal” TFLOPS with 11GB of GDDR5 memory at a very reasonable $699 sticker price.
While the promise of Titan V sounds very exciting, the $2999 price point is a bit hard to swallow. But with 10x TFLOPS, should you be purchasing Titan V instead? Is that like getting 10 1080 Ti’s? Or are you better off buying 4 1080 Ti’s for the same money?
On a side note, the DGX Station is currently on sale for $49900 (normally priced at $69900). It comes with 4 watercooled server-grade Tesla V100’s which are slightly better spec’d than the desktop-grade Titan V (though the tower looks BOSS and I’m guessing it comes with support and SLA.) But most of us (heck, I bet a lot of university and AI research labs) don’t have that kind of money to throw around. Even if you did, not sure it if makes much economical sense. BTW, if you are operating a data center, NVIDIA has recently updated their EULA to prohibit “desktop” grade GeForce and Titan GPUs from being used in data centers.
So the question is, is Titan V worth it if you are looking to build your own GPU rig?
Let me start by saying that Titan V and its server-grade big brother Tesla V100 are pretty new. V100 came out in May 2017, and Titan V just came out this month (Dec 2017.) Most of deep learning frameworks have been rushing to add Volta support to make sure that all the potential performance can be exploited.
For example, PyTorch have only recently come out with the 0.3.0 release which adds support for CUDA 9 and Volta GPUs. I have been playing around with this setup and the PyTorch community has been awesome (especially Soumith Chintala — thanks dude!) at helping me out. Now that I can run PyTorch on Titan V, I’ve written some benchmark code to demonstrate performance differences. So far, the results only include those of PyTorch 0.3.0 but I’m planning on adding results for other frameworks as well (EDIT: now the benchmark results contain the numbers from TensorFlow 1.4.0 and Caffe2 0.8.1 as well — see the link for more details), all on CUDA 9.0.176 and CuDNN 220.127.116.11, to even out the playing field.
deep-learning-benchmark - Deep Learning Benchmark for comparing the performance of DL frameworks, GPUs, and single vs…
So let’s check out the results above.
Titan V and 1080 Ti were compared head-to-head against the same settings (in addition, these cards are both on the 16x PCIE slots on the same computer.)
The timing numbers shown above are for the forward pass through the CNN (“eval”) and forward and backward pass (“train”), measured in milliseconds. Those are average numbers computed over 20 passes after some warmup runs. I’ve run this many times, and these numbers are pretty stable.
Interesting points to note:
- Obviously, Titan V is faster than 1080 Ti. However, if you simply compare the 32-bit (“single” precision) runs, Titan V is only ~20% faster than 1080 Ti.
- Titan V’s 16-bit (“half” precision) runs are non-trivially faster than those of the 32-bit counterpart. 1080 Ti benefited from going half-precision, but the performance gains are pretty modest compared with Titan V.
These numbers do not really scream “OH MY GOODNESS TITAN V IS A NO BRAINER.”
Is NVIDIA lying about the 10x TFLOPS? I’m sure their marketing folks are good honest people, but there are multiple factors at play here. One, I’m sure there is still room for improvement from the software side to extract every bit of the Volta’s super fast Tensor cores. But even then, if most code paths do not conform to the conditions that allow for maximum theoretical performance (see A FEW RULES in “Programming Tensor Cores in CUDA 9 by NVIDIA), there’s only so much you can do. Time will tell how much improvements we will see at the framework/CUDA/CuDNN level to exploit the Volta GPU capabilities as much as they can, but the initial numbers that I have observed on popular CNN’s (I do computer vision stuff mostly hence my focus on CNN’s) don’t seem to justify getting a Titan V, especially if you were to get one now as software plays catch up.
Although it is fun to play around with the latest tech and be an early adopter, if you are focused on doing AI research or building products on your personal GPU rig economically, I would suggest buying 1080 Ti’s as of this writing (until NVIDIA comes out with an even better alternative in the near future; they tend to surprise us with new releases.) Also consider that a single Titan V is only 12GB of memory while 1080 Ti has marginally smaller memory at 11GB. But if you buy 4 of them for the same money (at least on GPUs… you also need a beefier power supply, motherboards that can support 4 GPUs, more RAM, and better cooling so on), you will end up with way more GPU memory (44GB vs 12GB.) Having said that I think a more practical thing to do, if you want to get more than one GPU, is to get 2x 1080 Ti’s. This way, the GPUs do not have to be stacked right on top of another (more airflow for cooling — when these cards thermal throttle, you could lose performance significantly… you can do water cooling but that adds significantly to the cost, effort, and some risk.) Also with only two cards, the power draw is 250Wx2 from the GPUs so you can get away with having a cheaper power supply. At any rate, I find it extremely valuable to have multiple GPUs and split experiments across them so that I can iterate quickly, so this is what I would recommend as a good compromise.
Another option for addressing the heat issue of stacked, air-cooled GPUs, if you must have 3–4 of them, don’t want to water cool, and don’t care about aesthetic or noise, is to get some PCIE extenders/risers and do what this winning Kaggler did:
I hope you find this post helpful. Please let me know if there are any particular results that you want to see on Titan V or 1080 Ti. Thank you for reading!