Prefacing our deep-dive into TeraScale, GCN & RDNA…
This post has been split into a two-part series. Please find Part 2, An Architectural Deep-Dive into TeraScale, GCN & RDNA, here:
An Architectural Deep-Dive into AMD’s TeraScale, GCN & RDNA GPU Architectures
With an overview of AMD’s GPUs and supporting prerequisite information behind us, it’s time to delve into TeraScale…
Today we’ll look at AMD’s graphics architectures to gain a deeper understanding into how their GPUs work and some factors that contribute to the real-world performance of these processors. Specifically, we’ll be examining the TeraScale, GCN and the recently announced RDNA architecture families.
Let’s start off by associating these names to actual products on a timeline:
2. What is an architecture anyway?
The term ‘architecture’ can be confusing: termed ‘microarchitecture’ in the context of integrated circuits & abbreviated to μarch or uarch for convenience (μ being the Greek symbol denoting ‘micro’), microarchitecture refers to both the physical layout of the chip’s silicon innards as well as the implementation of a given instruction set, including both hardware and software design choices.
For context, Intel’s & AMD’s CPUs implement the 32-bit (x86) & 64-bit (AMD64) instruction sets, together called the x86–64 Instruction Set Architecture (ISA). They’ve done so for a while now and yet ever so often you’ll hear of a new ‘architecture’ such as Intel’s Skylake or AMD’s Zen. In these cases, the underlying instruction set stays the same (x86–64) while its physical implementation changes with new enhancements focused on improving performance and reducing power consumption. So, while the set of instructions that a chip understands and decodes/executes comprises the ISA of the chip, the term architecture refers to both that ISA as well as the physical implementation of said ISA.
ISAs are commonly categorized by their complexity, i.e., the size of their instruction space: large ISAs such as x86–64 are called Complex Instruction Set Architectures (CISC), while the chips powering smartphones and other portable, low-power devices are based on a Reduced Instruction Set Architecture (RISC). The huge instructions space of the typical CISC ISA necessitates equally complex and powerful chips while RISC designs tend to be simpler and therefore less power hungry.
ISAs don’t remain stagnant & new instructions are added all the time to introduce new features while entire extensions aren’t uncommon either: Intel’s AVX extension to the x86–64 ISA added support for new parallel processing modes on CPUs while Nvidia’s Turing brought along support for real time ray tracing on their RTX GPUs. Hardware changes may accompany such significant extensions such as the dedicated ray-tracing cores (RT cores) on Turing.
3. Overviewing AMD’s GPU Architectures
With that brief explanation, let’s overview AMD’s GPU architectures: TeraScale, GCN and RDNA. Starting way back with TeraScale may seem annoying and unnecessary but stick around & it’ll prove worthwhile.
TeraScale’s reign began back in 2007 and extended until late 2011, with some TeraScale GPUs released as late as 2013. TeraScale matured three generations over this period with the second generation being the most dominant and revered today. TeraScale is traced over a timeline below:
It’s hard to overstate the significance of TeraScale: AMD had just completed the acquisition of Canadian ATi technologies, the creators of the Radeon GPUs, a year prior to TeraScale’s release in 2007. TeraScale thus served as the first GPU architecture released under AMD though it’s reasonable to assume that it was well under development before ATi’s acquisition. TeraScale was significant for several other reasons though: It was the first ATi GPU for which the underlying ISA & microarchitecture were publicly detailed & it existed at a significant time wherein the concept of the “GPGPU” was just starting to take a hold:
The General-Purpose GPU or GPGPU concept looks at utilizing the significant computational power of GPUs for general workloads rather than just graphics, outside of which GPUs sat largely idle. This is significant because until this point GPUs had existed purely for graphics workloads (as suggested by their name) with every aspect of their design accordingly specialized.
Why do this when every system is already equipped with a general-purpose processor, the CPU? Because the specialized nature of the GPU meant that it could carry out a certain type of math really, really fast. Magnitudes faster than the CPU, in fact. It also turns out that while such math was typical of graphics workloads, many scientific and compute workloads relied on similar calculations and would therefore benefit greatly from access to the GPU, which is built up of thousands of cores to perform said math in a massively parallel operation. While “thousands of cores” may sound absurdly large compared to the typical CPU core-counts we’re used to, keep in mind that those CPU cores are general processors that are individually much more complex and capable than their corresponding GPU counterparts.
As the GPGPU concept began to take hold, AMD’s first foray into the territory came in the form of support for the OpenCL library on their TeraScale Gen1 GPUs. OpenCL (Open Compute Library) is the dominant opensource library for compute on “heterogenous” systems, i.e. systems combining different types of processors such as CPUs and GPUs. Further, AMD’s Fusion initiative looked to merge CPUs and GPUs onto a single package, further pushing heterogenous system architectures (HSA) and resulting in the creation of the “Accelerated Processing Unit” or APU, a moniker that’s still used today.
Though AMD’s GPGPU foundations were thus first firmly laid within TeraScale’s architectural depths, it would be TeraScale’s successor GCN that would cement AMDs commitment to the GPGPU initiative. TeraScale was thus the last of the pure graphics focused, non-compute centric GPU architectures from AMD/ATi. The GPGPU movement would eventually go on to become the central enabler for the machine learning revolution of today, powering the neural networks behind the self-driving cars and AI enabled voice assistants that are so ubiquitous now.
At its core, TeraScale was a VLIW SIMD architecture (don’t let these terms scare you off, they’ll be adequately addressed soon) which contributed significantly to its gaming dominance at the time.
3.2. GCN — Graphics Core Next
GCN has been the dominant GPU architecture for AMD this decade and currently features on the ‘Polaris’ and ‘Vega’ family of GPUs with Polaris comprising the fourth generation and Vega comprising the fifth and final iteration of GCN. Polaris targets the low & mid-range segments of the market with the RX400 & RX500 series of GPUs leaving Vega to target the upper tier segments with the Vega 56, Vega 64 & the 7nm Radeon VII cards. In addition to these, Vega features on AMD’s ‘Instinct’ lineup of machine learning GPUs as well as on the ‘FirePro’ lineup of professional graphics GPUs.
Over the course of its years, GCN matured five generations and saw the release of many product families spanning desktop & laptop GPUs, APUs, FirePro rendering GPUs & the MI series of machine learning accelerator cards. The major desktop gaming GPU families are traced over a timeline below:
Though Vega ushered in the last of the venerable GCN-era of GPUs, GCN continues to assert a strong influence on its architectural successor RDNA and it’s reasonable to expect this influence to continue into future generations as well. Besides, there are a lot of GCN cards out there today and that will probably remain the case for a while going forward. This current & near future relevance alone make deep dives such as this worthwhile however historic factors & current perception play an important role as well: having originally debuted back in 2012 on the Radeon HD 7700 series of the ‘Southern Island’ family of GPUs, GCN is now viewed as an ancient workhorse, a product well past its prime with every drop of performance squeezed out of it. Indeed, AMD seems to think so as well with GCN’s successor now finally out the door featuring significant changes at fundamental levels.
With GCN, AMD made it clear that general compute was going to be a big deal for GPUs going forward and the many architectural changes reflect this. These remain a topic for discussion within the enthusiast community until today and will remain a focus here as well.
3.3. RDNA — Radeon DNA
RDNA’s goal, purpose and central mantra can be summed up in two words: efficiency & scalability. When given the same number of compute resources as a GCN based chip, RDNA manages to get more work done while requiring fewer threads in the pipeline to keep its resources adequately utilized and busy. RDNA also plans to feature on everything from mobile phones to supercomputer accelerators and of course, on consoles and your high-end graphics cards.
More on that scalability thing: Sony plans to use RDNA for its hotly anticipated PlayStation 5, Microsoft plans to do the same for its own hotly anticipate “Project Scarlett” Xbox and perhaps most surprisingly, Samsung plans to use RDNA graphics in their next generation of Exynos chips for smartphones.
Not done yet: on the other end of the spectrum, Google announced their upcoming cloud-based gaming subscription service ‘Stadia’ would make exclusive use of AMD’s GPUs while supercomputing veterans Cray announced the Frontier supercomputer for the US Department of Energy would be entirely based on AMD’s CPUs and GPUs to deliver 1.5 Exaflops of compute power, making it the most powerful computer in the world equaling the combined grunt of the top 160 supercomputers today. Wow.
Certainly big wins and nothing to scoff at; a darn good start for RDNA indeed!
4. Understanding the GPU’s Playground: The Display
Let’s preface our architectural deep dive with a review of the GPU’s fundamental output device, the monitor. All your digital adventures occur within the realm of your screen’s pixels and it’s your GPU that paints this canvas. To do so, it needs to draw or “render” visual data onto your screen’s individual pixels. Looking at a standard full-HD screen:
Over 2 million pixels with 1920 pixels in each of the 1080 horizontal rows giving us a full-HD resolution
Image source: ViewSonic Corp
The GPU draws up an image (called a “frame” in graphics parlance) representing the current display state and sends it to the screen for display. The rate at which the GPU renders new frames is measured in FPS, or Frames Per Second. The screen is correspondingly refreshed several times a second, measured in Hertz and typically 60Hz, ensuring that screen updates are smooth and natural rather than sudden & jarring. In this sense you can correctly think of the frame rendering & refresh cycle as akin to the cinema halls of yesteryears, wherein images on a spinning reel were projected onto a screen creating the illusion of a video, aptly named a “motion picture”. It’s truly the same process today, just entirely digital & a lot more high-tech!
The take-away here is that rendering content is a lot of work that involves updating over two million pixels simultaneously several times a second in the context of a full-HD screen and over four times as many for a 4K screen. The good news is that each pixel can often be processed entirely independently from other pixels, allowing for highly parallel approaches to processing. And in this computational playground lies the key distinguishing factor between the CPU & the GPU:
5. CPUs vs GPUs — SISD vs SIMD
Any processor can fundamentally be described as a device that fetches data and instructions, executes said instructions against said data and produces an output which is then returned to the calling program.
A GPU does the same with one key distinguishing feature: instead of fetching one datapoint and a single instruction at a time (which is called scalar processing), a GPU fetches several datapoints (this group is called a vector) alongside a single instruction which is then executed across all those datapoints in parallel (thus called vector processing). The GPU is thus a vector processor said to follow a Single Instruction Multiple Data or SIMD design.
There are caveats of course: such a SIMD design works only with tasks that are inherently parallelizable, which requires a lack of interdependencies between datapoints: after all, operations cannot be executed in parallel if they depend on each other’s output! While graphics and some compute applications are highly parallelizable and thus suited to such a SIMD execution model, most applications are not. Therefore, in an effort to remain as general purpose as possible the CPU remains a traditional scalar processor following a Single Instruction Single Data (SISD) design.
And with that understanding, we’re now ready to move on.
We’ve now overviewed the GPU architectures released by AMD following their acquisition of ATi Technologies and overviewed the humble monitor as well as the fundamental difference between the CPU & the GPU. We further observe that this is an exciting time wherein GCN’s long overdue successor has finally arrived: while TeraScale was a very successful gaming architecture and GCN laid firm foundations for AMD’s foray into GPGPUs, RDNA seems set to do it all better than before in more devices than ever before and at every possible scale. But what fundamentally distinguishes these architectures? What causes them to do the same things, i.e. crunching numbers and putting pixels on your screen, so differently? Enough background and pre-requisites, it’s time to delve deep within.
Find Part 2, An Architectural Deep-Dive into TeraScale, GCN & RDNA, here: