Draw calls in a nutshell

Oxide Games® / Stardock® Ashes of the singularity™ — a game that puts modern API performance to its advantage by using a massive amount of draw calls per frame

While not the only thing that has changed, all the talk about modern graphics APIs usually comes down to a term of “draw calls” — one of the holy grails of modern rendering engines, a segment of rendering process that has been improved with Mantle, Vulkan and Direct3D12.

To keep this story as short as possible, I’ll avoid a vast amount of details involved and anatomy of all the involved elements, but it is important to explain what a draw call is and what role it plays in what we may call a rendering pipeline. Basically a draw call contains all the information telling GPU about textures, states, shaders, rendering objects, buffers, etc. encapsulated as CPU work that prepares drawing resources for the graphics card. Converting state vectors (all the information mentioned before) to hardware commands for the GPU is very “expensive” for the CPU and API complexity becomes API overhead that does not help.

Since a draw call is required for each different material, having a variety of unique objects in a game and multiple different materials, the number of draw calls is raised accordingly. Since CPU work to translate this information to GPU hardware commands takes time, sometimes we see CPUs bottlenecking GPUs exactly with a high number of draw calls involved.

Perfectly balanced CPU and GPU work resulting in frame times of 16.66ms which equals 60 frames per second

For this purpose we can take the above illustration of how CPU and GPU work are related in an ideal situation where CPU performance ideally matches GPU performance and is perfectly aligned for 60 FPS rendering. As you can see, CPU work is always done ahead of GPU for each of the 3 rendered frames on our basic rendering loop timeline. CPU completes its work for the first rendering frame, while GPU renders that frame, CPU works on the next frame (16.66ms to 33.33ms), then CPU works on the third frame from 33.33 to 49,99ms and so on.

Red square(ish) things are time slots that CPU uses for each of the four draw calls per frame. We can think of the illustration as DirectX 11 situation, but with ideal conditions and no bottlenecks since both CPU and GPU just complete their work in a timely manner.

However, this is almost never the case and our next illustration might give us a better clue on how things sometimes really look like:

A not so perfect situation where CPU is too slow and bottlenecks the GPU that actually can handle 16.66 ms frames (60 FPS) but must wait on the CPU instead

Here we can see a situation where we have a very slow CPU that can’t really prepare all the material for GPU on time. CPU work with four of our draw calls takes 21.34 milliseconds per frame. Our GPU is fast enough for 16.66 ms frames (60 FPS per second or it may even be faster), but since it must wait for CPU it remains idle after it renders the frame it actually does have data for. Our CPU takes 23.45 ms per frame which — if we imagine our GPU is exactly as fast as needed for 16.66 ms frames or 60 FPS — makes our GPU usage drop to 78% which is effectively CPU bottlenecking.

CPU bottlenecking can also happen for various other reasons like time CPU needs to process physics, load things, process some game mechanics, network data or AI. All the work that does not fall under the draw call umbrella, but is a CPU thing. In all cases when we have a CPU that is not completing those tasks fast enough, we experience CPU bottlenecking that makes our GPUs wait and sit idle, which is seen as lower than maximum GPU usage.

We can buy a faster CPU, but that might not always help and if we already have the fastest CPU available; we just might have to reduce the game complexity and thus help our CPU to catch up with the GPU.

One of the ways to enhance this experience is to tackle the draw call part and reduce API overhead together with reorganising how things work and enable developers to put multiple CPU cores at work to maximise efficiency and deliver data to the GPU hardware as soon as possible.

AMD, in cooperation with DICE, developed Mantle, an API that introduced better optimisations and lifted API overhead in some places, mostly at the part that affects the draw call overhead and enables many, many more draw calls in the same time. Since now CPU work on preparing data for the GPU has been significantly improved, slower CPUs can fit not only the old amount of draw calls (in our illustration it is 4 draw calls), but many times more.

With new APIs we have reduced the CPU bottleneck back to ability to do more in less time and got back on track to support GPUs ability to output 60 or more FPS

We can now process more than double the amount of draw calls in less time than before — within a single 16.66 ms frame we can process 2.5 times more, 10 draw calls instead of 4, thanks to API changes (new APIs) like Direct3D12 (DirectX 12), Mantle or Vulkan bring. In reality this number is much higher than 2.5x.

If we don’t need the added graphics complexity and variety we get with the increased number of draw calls, we can keep the old amount of 4 draw calls and let our CPU work less time, finish its work in less than 16.66 ms and enable our GPU to output even more FPS for modern monitors with 75Hz refresh rate:

The CPU finishes its work in 12.66 ms per frame which enables our powerful GPU to render 79 frames per second, perfect for our 21:9 wide 75Hz IPS gaming monitor

We kept our 4 draw calls, CPU works less, GPU can render more frames per second, boosting our FPS to more than 78 per second, reducing perceived lag, latency and enhancing our gaming experience.

This is how we can get machines with slower CPUs that have been bottlenecking our GPUs in draw call department to work great again. Enhancements in modern APIs are not only related to draw calls or CPUs only, but that is the biggest change that will be directly visible to games as performance enhancement or better game graphics. Those that had slow CPUs will finally be able to get more performance out of their cutting-edge GPUs and those with powerful CPUs will be able to enable more quality features that would otherwise be impossible even for them due to draw call limitations but other things too.

GPUs don’t get idle times only because of slow CPUs, but also because of inherent nature of how rendering works at this moment. Currently GPUs can have idle times and high latency even due to gaps in rendering and computing tasks submissions for GPU cores. This has been tackled by implementing asynchronous shader processing techniques that help reduce latency enabling multiple task queue submissions in parallel.

Modern APIs bring many thing to the table of performance and quality enhancements. Mantle and APIs that have evolved from and influenced by it enable future game developers to use many techniques that will further advance graphics fidelity on PCs and finally unleash the power even mid-range machines kept at bay due to inefficiencies of the API layer.

As the title screenshot illustrates, modern games built upon modern graphics APIs will be able to render our games with unprecedented amount of visual uniqueness and quality that many games before craved for.


I’m in no way related to Oxide Games® or Stardock Entertainment®. The title screenshot and game mention are for illustrational purpose only.

Stardock, Ashes of the Singularity and title screenshot — Copyright © 2015 Oxide Games. Ashes of the Singularity is a trademark of Stardock Entertainment.