Longer term thoughts on Nvidia: the importance of algorithmic diversity in AI and DLSS in gaming
*Originally published most of this as a thread on Twitter on January 7, 2019 right after Nvidia’s CES event — seeing Nvidia print the quarter reminded me to publish it here.
The most critical variables to me as a long term investor in Nvidia in 2019 are/were 1) whether game developers would adopt Deep Learning Super Sampling (DLSS) and 2) whether TensorFlow would emerge as the “one framework to rule them all” in AI. The answers appear to be 1) yes after this years CES event (January 6, 2019) and 2) no given the growth of Pytorch, which substantially increase the long term attractiveness of Nvidia as an investment. In this post, I will also touch on the importance of GPU utilization in the data center relative to the utilizaton for chips from specialized deep learning startup co’s, CUDA and the fact that Nvidia consistently makes the highest performance GPU silicon — all of which are important to the long term thesis on Nvidia.
DLSS and gaming:
Deep Learning Super Sampling or DLSS is important because without it, very few gamers will use ray tracing in the next few years and Nvidia just made an enormous bet on ray tracing. Ray tracing is *only* viable in the next few years if game developers adopt DLSS. And if they adopt DLSS, then Nvidia is on its way to having its deepest competitive advantage ever in its core graphics market.
Ray tracing makes games more beautiful — and is the future of graphics — but beautiful graphics are a secondary consideration to most buyers of leading edge GPUs. Most gamers are buying leading edge GPUs to have a faster frame rate in competitive FPS (first person shooter) games, which gives them a competitive advantage and improves their K/D. Essentially, a faster frame rate means that you see your opponent before they see you.
Ex DLSS support, games only run at a ~35% faster FPS (frames per second) on the RTX 2080 Ti relative to the GTX 1080 Ti vs. prior new architectures which offered a ~70ish increase in FPS over their predecessor architecture. With DLSS, the RTX 2080 Ti runs games almost 70% faster. This meant the RTX series was not a good “value” without DLSS. The reason for this is that Nvidia spent valuable silicon real estate in the RTX architecture on both ray tracing and DLSS at the expense of more traditional GPU silicon, which was an enormous bet they could afford to make given effectively zero competition from AMD at the high end.
If developers did not adopt DLSS and raytracing, then AMD might be able to get back into high end GPU competition by focusing on traditional graphics technologies (TAA, etc.). However, if developers adopt DLSS, Nvidia will have their biggest advantage over AMD ever as it is a proprietary standard. Prior attempts at this by Nvidia (PhysX, Hairworks) were largely unsuccessful. It also lets Nvidia have their cake and eat it too by putting the deep learning focused tensor cores in the RTX series to use in graphics.
At this years CES event (January 6, 2019) Nvidia announced that both Anthem and Battlefield V, as rumored, would support DLSS. I suspect this will result in most AAA games following suit over the next year. Metro Exodus is also supporting DLSS and might be the single best implementation of ray tracing and DLSS yet per reviews and my own experience. Both Anthem and Metro Exodus are top 10 games on Twitch right now — and given that they are not PvP centric, I think they will drive more demand for RTX GPUs than prior, more PvP focused games that supported ray tracing and DLSS.
The emerging game developer support for both ray tracing and DLSS makes life difficult for both AMD and Intel (on their second attempt at a discrete GPU following Knightsbridge) going forward. Not as important, but bringing Gsync to Freesync monitors eliminates AMD’s only current competitive advantage (much cheaper monitors with adaptive sync to eliminate tearing). Icing on the cake. The GeForce Experience software which automatically keeps your drivers updated and games optimized is also highly underrated — it wasn’t easy to “tune” gaming PCs prior to this. AMD does not really have a competitive answer to GeForce Experience. Ray tracing, DLSS and GeForce Experience are significant competitive advantages in addition to Nvidia’s consistent raw performance advantage.
As a sidenote, and as explained in a separate Medium post (and Twitter thread), AMD may work big from here, but outperformance will be driven by their CPU business. The fact that Intel has fallen behind in process development for the first time in decades is seismic. I literally cannot believe that AMD, Nvidia, Apple and Qualcomm have a manufacturing technology process advantage over Intel.
Diversity in both frameworks and algorithms used for deep learning/AI:
Diversity in both frameworks and algorithms is a significant part of Nvidia’s competitive advantage in training (and data center inference to a lesser extent). Diversity in both makes life difficult for all the deep learning semiconductor startups. It requires an immense amount of software support from a semiconductor company to make their chips work in a given framework and then within each framework, different algorithms can and need to be optimized. Therefore having one framework — such as Tensorflow — emerge as the “one framework to rule them all” is a risk to Nvidia in that it makes life easier for the deep learning semiconductor startups as they could focus all of their development efforts on that one framework — effectively reducing the software load to a manageable level.
Tensorflow’s emerging dominance (late 2017, early 2018) was a risk to the diversity cited above as a significant barrier to entry and all of the startups were focusing their software development on it. However, the dynamics within Tensorflow are particularly complicated for both the startups and Nvidia due to the fact that Google has never guaranteed backwards compatibility with Tensorflow to my knowledge. This is a significant risk to both Nvidia and the startup given that Google has its own deep learning semiconductors (TPUs)
Even though TPUs are inferior to Tesla’s (Nvidia’s chips not the cars), Google does not have to pay Nvidia’s margins on top of the foundries which meant they could make TPU’s cost competitive with Nvidia’s Tesla’s on Tensorflow both internally and externally. And controlling both Tensorflow and TPU development gives Google significant advantages, especially if they ever broke backwards compatibility such that TensorFlow would only run on TPUs (Nvidia could engineer around but painfully). The AI community would’ve revolted but this was still a risk.
Therefore, the emergence of PyTorch as the preferred framework for research post Facebook open sourcing it was super powerful for Nvidia — both reducing Nvidia’s strategic vulnerability with respect to Tensorflow and increasing the barriers to entry for startups which now absolutely need to support multiple frameworks. Google’s announcement of TPU support for Pytorch was effectively an admission of defeat. And given all of the ongoing AI research, I think we will continue to see algorithmic diversity (GANs, etc) continue to grow.
The utilization rate of semiconductors in a data center is directly related to their cost. One could argue that the heart of cloud computing’s advantage over on-premise computing is that it allows for a higher utilization rate. Variable costs are high, but so is the fixed cost of buying semiconductors, the data center, etc. i.e. Ignoring variable costs, a datacenter (and the semiconductors within it) that is utilized 50% of the time costs 2x as much as a data center that is utilized 100% of the time. The fact that GPUs — unlike TPUs or most other accelerators — can be used for a wide variety of applications leads to a higher utilization rate for GPUs and a lower effective cost.
GPUs can be utilized for high performance computing and — perhaps more importantly — traditional machine learning in addition to deep learning. NVIDIA just released their RAPIDS set of software libraries, which should lead to much more utilization of their Tesla GPUs for traditional machine learning, and TensorRT which should lead to higher utilization of their GPUs for inference. As a result, in several months GPUs that were previously used primarily for HPC and deep learning in data centers will be able to be also used for traditional machine learning and inference with a much higher level of performance (optimization) than was previously available all simply because of software libraries released by Nvidia. This utilization advantage for Nvidia GPUs is another underappreciated barrier to entry for startup deep learning semiconductor companies whose chips cannot do traditional machine learning or HPC. What they gain from specialization, they lose in flexibility and potential utilization. Some of the startups will succeed — my bet would currently be on Cerebras in data center training and MythicAI in edge inference and potentially data center inference — but many of them will fail.
CUDA (Compute Unified Device Architecture):
Per Wikipedia, “CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. The CUDA platform is a software layer that gives direct access to the GPU’s virtual instruction set and parallel computational elements, for the execution of compute kernels” CUDA is a proprietary standard that only runs on Nvidia GPUs. Nvidia has created many different CUDA libraries that are optimized for specific applications (VR Works, Compute Works), but the most strategic is cuDNN (deep neural networks) which is what led to Nvidia GPUs dominating all deep learning training to date as cuDNN made it easy for programmers to take advantage of GPUs built in parallelism, which is ideal for deep learning.
Basically, CUDA means that Nvidia has by far the best software toolchain for any non gaming application for GPUs — Nvidia often says they have more software engineers than hardware engineers. CUDA allowed Nvidia to extend their traditional and well understood advantage in gaming drivers to every other general purpose application. CUDA enables the RAPIDS software libraries cited above. And as an example that shows the superiority of the CUDA toolchain to every competing toolchain, Uber recently open sourced a new GPU accelerated database called AresDB — and it runs on CUDA. i.e. One cannot use AMD GPUs to run AresDB without degrading its performance (via a translator).
AMD’s just released Radeon VII is made at 7 nanometer vs. the cost comparable Nvidia RTX 2080 at 12 nanometer. The Radeon VII does not devote extensive silicon real estate to Ray Tracing or DLSS, again unlike the comparable Nvidia RTX 2080. Given all of this, I would’ve expected the Radeon VII to be at least 25%ish faster than the RTX 2080 running games without ray tracing — instead it is 5–6% slower than the RTX 2080. This was AMD’s best chance to reinsert themselves into the high end gaming GPU market and they failed. Nvidia will have a 7 nanometer version of the RTX with an improved architecture at the end of this year. And as mentioned before — if game developers continue adopting DLSS, it is game over.
Nvidia is at the epicenter of almost every secular trend in technology:
Nvidia’s stock may go down near term due to the crypto overhang, but the 3–5 year outlook is powerful. It is at the epicenter of graphics, VR, self-driving cars, AI and a free option on crypto ever recovering. And with respect to crypto, I would carefully monitor the upcoming Facebook Stablecoin.
No investment advice, views are all my own!