The Elusive Frame Timing
Finally, an explanation for why some games stutter on your PC (and a glimpse of hope that this might stop happening in the near future)
You’ve been waiting for the next installment of your favorite PC game series for so long, and now it’s finally here. This time, you want to make sure you can enjoy it fully right from the start, so you invested time and money to prepare meticulously. You’ve upgraded your CPU, installed a bleeding edge GPU, added more RAM — heck, you even prepared an SSD RAID. It must be silk-smooth right from the intro screen.
Your pre-order has finally unlocked and has just finished installing. Nervously, you are starting it for the first time. So far, so good — the game is running at 60 frames per second. Or at least the frame counter from the latest GPU tuner overlay says so. But, something is not right. You flick your mouse around in sharp, deliberate movements. You side-strafe left and right quickly… and… It stutters! IT FRIGGIN’ STUTTERS! Argh, how can it be? How can it stutter at 60 frames-per-bloody-second?
It may sound ridiculous, if it never happened to you. But if you’ve ever experienced it, you almost certainly hate stutter with a passion. Stutter in games. Not the plain old “lag”. Not the low frame rate. But just “stutter”, happening at high frame rates, on perfect, super fast configurations. What is it, where does it come from, and is there any way in this world to get rid of it? Let me tell you a story…
Stutter, smoothness, performance… it’s all the same, right?
Video games have been running at 60 fps since the days of first arcade machines, back in the ‘70s. Normally, it was expected that the game runs at exactly the same frame rate that the display uses. It wasn’t until the popularization of 3D games that we first started accepting lower frame rates. Way back in the ‘90s, when “3D cards” (that was before we started calling them “GPUs”) started to replace software rendering, people used to play games in 20 fps and considered 35 fps speeds for serious competitive netplay. I’m not kidding.
Nowadays, we have super fast machines, and “of course we can run at 60”. Yet… there seems to be more players unhappy with game performance than ever before. How can that be? Well, it’s not that the games are not actually running fast enough. It’s that they are stuttering even though they run fast!
If you look at a few game forums, you are likely to find something like this:
One might think those are isolated cases, but look at Google search stats:
(Note that these are relative values. It’s not that people are searching for stutter more than for frame rate in general. It’s that while frame rate query is stagnating, searches for stutter are growing, especially recently.)
A decade of search for the cause of inexplicable stutter
I first saw this issue way back around 2003. We were working on Serious Sam 2, and people started reporting cases where they were testing something on an empty level and mouse-look and movement wasn’t smooth. It was accompanied with a very specific pattern in the frame-rate graph, which we started calling “heartbeat”.
We thought we had a bug somewhere in our code, but couldn’t find it. The issue would come and go seemingly at random — when restarting the app, rebooting the machine… you’d change some performance option and it would be gone. Then you’d change that option back, but the problem wouldn’t come back. It was like a ghost.
Apparently, we were not the only ones with this issue. Seeing similar problems in other games we were playing, we started to think it was something in the drivers. But it happened across different GPU vendors. Even across different APIs (OpenGL, DirectX 9, DirectX 11…) — the only consistent thing was that it appeared on some machines, on some scenes… sometimes.
We’ve released several more games, while this weird thing was coming and going. It used to bother some users, and we would usually tell them to change some performance options — which sometimes helped, and sometimes not. Guess that’s life, eh?
Then suddenly, on a nice winter day, early in 2013, my colleague Dean called me to come and see yet another instance of this issue which he was, for the moment, able to relatively consistently reproduce — this time on a level from Serious Sam 3. We were tinkering around with the options on that scene, when it suddenly occurred to me. I realized what was causing this! And it was so simple that it’s no wonder it was escaping everyone’s attention for a decade.
By changing just one very simple game engine option, we were able to make this problem go away or come back, in this particular scene. But it was immediately obvious to both of us that solving this for good will probably take much, much more effort. Effort not just by us, but from the entire PC gaming ecosystem — GPU driver writers, API maintainers, OS vendors — everyone.
Let me explain.
What’s been going on all along
I wish I could now show you an example based on the scene from Serious Sam 3 that Dean and I were looking at five years ago. Or even better, the original test scene in Serious Sam 2 where we first saw this. Unfortunately, this elusive beast moves from scene to scene as you change hardware. I do have a scene from The Talos Principle where I was able to reproduce this recently, and I took some videos, which we will now analyze in more detail.
But before we start, we must first make sure you can actually view 60 fps videos. In the below examples, make sure you set your viewer to 1080p60, as shown here:
If you set that correctly, and if your computer and web browser are capable of showing 60 fps videos, then the video below should play perfectly smoothly, without any stuttering whatsoever. If not, oh well — that’s why we are talking about this: many other applications manifest this same problem as well, not just games. For now, I can only tell you to try on some other machine. Or just read the text.
Now for the real thing. If you are experiencing the stutter, it most probably looks something like this:
Yes, that’s what it looks like when a game “stutters” even though it runs at 60 fps. You might have experienced something similar when playing any modern game, and you probably thought “the game is not optimized”. Well, let’s reconsider that theory (of such stutter being caused by the game rendering “slowly”). If a game is “too slow”, then it means at some points it will not be able to render one frame quickly enough, and the monitor will have to re-show the previous frame again. So, when we take a 60 fps video of it, the video must exhibit “dropped frames”. (These are the frames where the next frame wasn’t rendered in time so the same frame was shown twice.)
Now open the previous stuttering video (the “heartbeat”) again, pause the video and use the
. (dot) key in the YouTube player to move frame by frame. Try to find where the same frame is shown twice. Go on, try it out. I’ll wait here…
So, did you find it? No? Now that’s weird, isn’t it…
It looks very not-smooth when you play the entire animation as a whole, but when you go frame-by-frame, there are no discontinuities!
How can that be?
Let’s examine this in more details. Here is a side-by-side comparison of the ideal smooth video, and the one with the heartbeat stutter, played back at 1/20th of original speed, so you can see individual frames:
You can notice two things: First, that they indeed run at the same rate — whenever there’s a new frame in the top (correct), there is also a new frame in the bottom (heartbeat). Second, they seem to move a bit differently for some reason — there’s a noticeable “tear” in the middle of the image, and it is oscillating between being more and less apart.
A careful eye may observe one more curious detail: The bottom image — the stuttering one — which is supposedly “slower”… is actually going “ahead” of the correct one. Strange, huh?
If we take a look at a few consecutive frames, and their timings (notice that the videos I’ve been showing so far all have precise timers (precise to 1/10,000th of a second), we can observe something very interesting: The first two frames are perfectly in sync, but then the third one…
…on the third frame the tree on the “slower” video is significantly ahead of its counterpart on the correct video (circled in red). You can also notice that this frame apparently took a longer time (circled in yellow).
Wait, wait, wait… if a video is “slower”, and the frame “took more time” how can it be ahead?
Well, to explain this, you have to understand how games, and other 3D interactive applications are actually doing their animation and rendering nowadays. (Experienced developers will excuse me if I’m boring them with things they know here, but I have to make sure all the gamers that might be interested in this can follow the text.)
A brief history of frame timing
A long time ago, in a galaxy far, far away… When developers made first video games, they would normally design for the exact frame rate that the display runs on. In the NTSC regions which run TVs at 60 Hz, it would mean 60 fps, in PAL/SECAM regions which run TVs at 50 Hz, it would mean 50 fps. They would never even exercise a thought of perhaps “dropping a frame”.
Most games were very streamlined and simplified concepts, running on fixed hardware — usually an arcade console, or a well known “home micro-computer”, like ZX Spectrum, C64, Atari ST, Amstrad CPC 464, Amiga, etc. Basically, one designed, implemented and tested for a particular machine and particular frame rate, and was 100% sure that it would never drop a frame anywhere.
Velocities of objects were also stored in “frame” units. So you wouldn’t say how many pixels per second a character would move, but how many pixels per frame. In Sonic The Hedgehog for Sega Genesis, e.g. rolling speed is known to be exactly 16 pixels per frame. Many games even had separate versions for PAL and NTSC regions where animations were hand-drawn specifically for 50 fps and 60 fps respectively. Basically, running at any other frame rate was not an option.
As games started running on more varied machines — notably PCs with expandable and upgradeable hardware — one couldn’t be sure which frame rate the game will run on anymore. Compounding that fact was the fact that games became more complicated and unpredictable — most notably 3D games can have large variances in scene complexities, sometimes even player-driven variances. E.g. everyone loves shooting at a stack of fuel barrels — causing a huge explosion, nice fireworks… and an inevitable frame drop. But we don’t mind the frame drop there — because it’s so much fun.
So it can be hard to predict how long it will take to simulate and render one frame. (Note that on consoles today, we still have fixed hardware, but the games themselves are often quite unpredictable and complex anyway.)
If you cannot be sure which frame rate the game will be running at, you have to measure the current frame rate and continually adapt the game’s physics and animation speed. If one frame is taking 1/60th of a second (16.67 ms), and your character runs 10 m/s, then it moves by 1/6th of a meter in each frame. But if the frame is not 1/60th anymore, rather it suddenly started taking 1/30th of a second (33.33ms) — you have to start moving the character by 1/3rd of a meter (two times “faster”) per frame, so that it continues moving at the same apparent speed on screen.
How does a game do this? Basically —it measures time at the start of one frame, then on the start of the next one and calculates the difference. It’s quite a simple method, but it works very well. Sorry, it used to work very well. Back in the ’90s (remember those “35 fps speeds for serious competitive netplay” from the beginning), people were more than happy with this method. But at that time, a graphics card (remember, they weren’t even called GPUs then) was a very “thin” piece of hardware, and the main CPU had direct control over when things get to the screen. If you didn’t have a 3D accelerator, the CPU was even drawing the things directly. So it knew exactly when they are ending up on screen.
What is actually going on today
Over time, as we started having more complex GPUs, those GPUs became more and more “asynchronous”. That means that when the CPU gives a command to the GPU to draw something on the screen, the GPU just stores that command in a buffer, so that the CPU can go on with its own business while the GPU is rendering. That ultimately results in the situation where the CPU tells the GPU that “this is the end of the frame” and the GPU just stores this as a nice piece of data. But it doesn’t really treat it as something of much urgency. How could it — when it is still processing some of the previously issued commands. It will show the frame on the screen when it’s done with all the work it’s been given before.
So, when a game is trying to calculate the timing by subtracting timestamps at the start of two successive frames, the relevance of that is, to be blunt… quite dubious. Let’s get back to our example from those short videos. We had those frames with camera panning across some trees:
Now recall this thing with timing and movements. In the first two frames, the frame timing was 16.67ms (which is 1/60th of a second), and the camera moves by the same amount in the top and bottom cases, so the trees are in sync. In the third frame, (in the bottom, stuttering case) the game saw that the frame time is 24.8ms, (which is more than 1/60th of a second), so it thinks that the frame rate has dropped and rushes to move the camera a bit more… only to find on the next, fourth frame the timing is only 10.7ms, so the camera moves a bit less there, and the trees are now more or less in sync again. (They don’t completely recover until about two frames later when everything reconsolidates finally.)
What happens here is that the game measures what it thinks is start of each frame, and those frame times sometimes oscillate due to various factors, especially on a busy multitasking system like a PC. So at some points, the game thinks it didn’t make 60 fps, so it generates animation frames slated for a slower frame rate at some of the points in time. But due to the asynchronous nature of GPU operation, the GPU actually does make it in time for 60 fps on every single frame in this sequence.
This is what we see as a stutter — animation generated for a varying frame rate (heartbeat) being displayed at actual correct fixed frame rate.
So, basically, there’s no problem whatsoever — everything is running smoothly, it’s just that the game doesn’t know it.
This brings us to the point from the beginning of the article. When we finally figured out that this is what caused the problem (actually, it’s an illusion of a problem — there’s no problem in fact, right?), here’s what we did for a test:
In the first part of the video above, you can see the heartbeat issue from the beginning. Then we change a “magic” option and after that — everything becomes perfectly smooth!
What’s the magic option? In Serious Engine, we call this
sim_fSyncRate=60 . In layman’s terms it basically means: “completely ignore all these timing shenanigans and pretend that we are always measuring steady 60 fps”. And it makes everything run smoothly — only because it was always running smoothly to begin with! The only reason why it ever looked stuttering is because the timing used for animation was wrong.
So that’s it? We just do that and everything is great?
Is the solution that simple?
Unfortunately… nope. That was only for a developer test. If we would stop measuring frame rate in real-world situations and just assume it is always 60, then when it does drop below 60 — and on a PC it will drop sooner or later for various reasons: OS running something in the background, power-saving or overheating protection down-clocking the GPU/CPU… who knows —then everything will slow down.
So, if we measure frame time, it stutters, if we don’t, everything can slow down at some points. What then?
The real solution would be to measure not when the frame has started/ended rendering, but when the image was shown on the screen.
So, how can the game know when a frame’s image is actually shown on screen? You might be surprised to learn that, in the current situation— there’s no way to do it!
Shocking, I know. One would expect this would be a basic feature of every graphics API. But it turns out that as things have been changing slowly here and there, everyone basically dropped the ball on this issue. We all forgot about the fine details of what is going on, kept doing basically what we were doing all the time, and the graphics APIs have evolved in all other aspects but this one: There’s no way for the application to know for sure when a frame was actually displayed on the screen. You can know when it finished rendering. But not when it got displayed.
Worry not, it’s not all that grim. Many people in the graphics ecosystem are currently busily working on implementing support for proper frame timing, under various names for different APIs. Vulkan API already has an extension called
VK_GOOGLE_display_timing which was shown useful in a proof of concept implementation, but it is available only for a limited range of hardware and mostly on Android and Linux.
Work is now underway to provide such, and better facilities, hopefully in all the major graphics APIs. When? It’s hard to say, because the problem cuts quite deep into various OS subsystems.
I can promise you, though, that we at Croteam are advocating tirelessly for this problem to be fixed as soon as possible — and everyone in the interactive graphics ecosystem is very understanding and helpful.
We are looking forward to having this available to a broader public, and when that happens, we will provide an update for The Talos Principle that implements this feature.
Miscellaneous caveats and other details
Consider the above an end of the main text. The sections below are “bonus features”, mostly independent of one another and the main text. I will probably be updating the below as the situation changes, or if more complex questions pop up that require addressing in the near future.
One thing involved in all this behind the scenes is the concept called Compositing Window Manager, aka the compositor. It’s the system now present in every OS that makes it possible for windows to be transparent, have blurry backgrounds, shadows, Skype popups over them, etc. Compositors can even go really overboard and show your windows in 3D. To do that, a compositor takes over the control of the very last part of frame image presentation and decides what to do with it immediately before it ends up on the monitor. This kinda complicates things a bit more.
On some OSes, the compositor can be turned off in full-screen mode. But that’s not always possible, and even then —why wouldn’t we be able to run the game in the windowed mode?
Power and thermal management vs rendering complexity
We also have to take into account the fact that modern CPUs and GPUs don’t run at fixed clocks, but both have systems that throttle their speed up and down depending on the load, and on current temperature. So the game cannot just assume that the GPU and CPU will have same speed from frame to frame. On the other hand, the OS and drivers cannot expect that the game will have the same amount of work in each frame. Complex systems for communication between the two sides have to be designed so that this is all taken into account.
Couldn’t we just…
Probably not. :) Usually, measuring GPU time is suggested as an alternative to display timing. But that doesn’t take into account presence of the compositor and the fact that none of the GPU rendering timers actually synchronize directly with the display refresh. What we need for perfect animation is definitely time when image was shown, not time when it finished rendering.