Fighting Desyncs in Melee Replays
For those who are completely new to Slippi, read the first release post for an introduction of what it is.
If you would like to support Slippi, please visit my Patreon or check out the Slippi merch at jisu.gg.
Intro
Slippi replays first became available to the public on January 17th, 2018. Since its release, an enormous amount of work has been put in to make Slippi as stable as possible. However, since the beginning we have been tested by playback desyncs of varying complexity and cause.
Many players familiar with Melee netplay have some understanding of desyncs. In the case of a netplay desync, one player will have the impression that their opponent is making nonsensical inputs, often times SDing. This happens because inputs are still transmitted over but the two game state no longer match. Playing back Slippi replays works somewhat like netplay in that it operates primarily on player inputs. A desync in a Slippi replay happens if, during playback, something happens differently than when the game was originally recorded.
Desyncs can happen for a variety of reasons. In this post, we’re going to look into the desyncs encountered in the process of working on Slippi.
Resync Logic
Before we get into specific examples, it’s worth discussing how Slippi desyncs differ from netplay desyncs. If you haven’t seen a Slippi desync before, you might notice some strange behavior for a bit followed by the game continuing as normal. This is due to Slippi’s resync logic as can be seen in the following video.
Slippi stores in the replay, along with the player’s inputs, information about every character. This information includes character position, animation state, facing direction, and more. When playing back a replay with the recorded inputs, Slippi also forcibly updates this information. Doing this allows any deviations from the recorded data not to desync the game permanently.
You might think at this point, if you can overwrite character information every frame, why do you need inputs at all? The truth is Melee’s game engine was not written in a way that works well with being forcibly told what to do; it expects to be the arbiter that determines how states transition into one another. In other words, if I tell the game, “Hey I want this character to be exactly here”, the game would respond with, “Well okay, I’ll try to get them there but there’s like… you know, a wall in the way”.
Basically the game state doesn’t end up synchronizing as quickly as you might like. That said, overwriting that data does eventually succeed in getting the game state back to the right place almost every time. The period during which things are “resyncing” are generally what we refer to as a Slippi desync.
And with that out of the way, I hope you’re ready to dig into some examples!
Solved Desyncs
RNG Desyncs
One common question I get is how random number generator (RNG) events like Peach turnips, Luigi misfires, and similar are kept in sync. To answer this, we have to first understand how random events in the game are calculated.
For RNG, the seed is king. The RNG seed is a 32-bit number stored at a fixed location in Melee’s memory. Whenever the game wants to take a random action, the result of that action is 100% dependent on the value of the seed. For example, if the game calculates a number between 1–8 and does it multiple times, if the RNG seed is the same during each calculation, the result will always be the same. That said, the chance of the seed being the same is very low. Every time a random event occurs, the seed is modified such that the next RNG call will use a different seed.
So let’s get theoretical for a bit. Given we are playing back a specific game exactly as it originally happened, we should be able to just set the RNG seed at the start of the game and all the RNG should play out the same for the entirety of the game. This is because the same exact RNG events will get called at the same exact times as the original and the seed will always get modified exactly the same way.
Unfortunately, things aren’t so convenient in the real world. For example, if we encounter an event that changes some RNG calls or RNG call orderings, the seed would be incorrect for the remainder of the game. Due to this, a decision was made early in development to back up the RNG seed and restore it with every character’s player inputs. This logic implies that the RNG seed is actually restored multiple times every frame. This turned out to be a prudent decision, but our story with RNG certainly does not end here.
Spawn Location Desync
One thing that will become a theme in this post is having to add special handling to deal with codes other people have written for Melee. In this case, the code in question is the neutral spawn code written by Achilles. This one was actually fixed very early on in Slippi development as it was effectively always present.
Slippi needed to work regardless of whether the players played on a set up with neutral spawns enabled or not. The problem with this is that dynamically applying codes at playback time is hard.
The neutral spawn desync would happen, for example, in the case where the console game was played with neutral spawns, and then the replay was played without.
Solving this was relatively straightforward, we just had to find a good place to do it. We knew the character positions ahead of time starting at the first frame of the game, we just needed to set that as the spawn point at the right time. Initially this was done in the actual spawning function with the help of Achilles, but eventually it was modified by UnclePunch to happen in the same place where we restore inputs.
Stock Steal Desync
Not too much interesting to tell about this one but it sure makes for a nice chaotic example. Stock steals in Melee happen when a player on a team runs out of stocks. The player with no stocks can press the start button to take a stock from their teammate, bringing themselves back to life.
The input that initiates the action is not backed up and restored by Slippi’s normal processing, therefore originally the stock steal would not happen during playback. The fallout from a missed stock steal is not something that is possible to resync from. We simply added special logic that could detect and trigger a stock steal at the point it happened in the replay.
UCF Desyncs
Slippi and UCF have had a tumultuous relationship at best. Many times with UCF I’ve gone through phases of relief that everything is finally fixed only to be disappointed later when a UCF desync inevitably resurfaces.
The first part of the story with UCF is actually just supporting it at all. Similar to the neutral spawn code from before, Slippi needs to support UCF being both on or off. The method for solving this was to back up the player’s settings at the start of a game and then gating the UCF functions based on these settings. If the option was not enabled, the code would be skipped for that player. So in short, for Slippi playback the UCF codes are always injected but are hidden behind toggles which are set at game start. This toggle concept has been used for other things such as supporting frozen Pokemon Stadium and PAL.
After completing the toggles came my first moment of relief thinking UCF was “solved”. This was proven false very quickly. Shield drops worked perfectly fine, but dashbacks were still having problems. We had to do some research into how UCF worked to figure out the problem. This wouldn’t be the last time we deep dived on UCF.
It turns out that UCF dashback logic accesses data that is unused in vanilla engine input processing. The vanilla engine input processing generally works on what we refer to as “processed” inputs. These are inputs that are effectively generated from “raw” controller inputs and used to decide how to change character state. For example, CPUs will have processed inputs but will not have raw inputs.
Slippi works by backing up and restoring processed inputs. UCF, however, looks at the raw x analog input for both the current frame and from two frames ago to determine whether it should trigger a dashback. The reason it uses raw inputs is because processed inputs do not have history whereas raw inputs have a 5 frame circular buffer of inputs stored.
The solution was to back up and restore the raw analog x inputs to the correct location — kind of a pain, but it did the trick.
Pokemon Stadium Transition Desync
Every desync up to this point has had a fairly obvious cause. With this desync, we’re going to explore a much more surprising interaction.
It turns out that PS transformation loads can play some tricks on Slippi. On Stadium, after the transformation timer expires, a random transformation is decided on and the game reaches out to disk to load it. The game does this disk read in an asynchronous fashion. In other words, it says “okay start fetching the data”, and then continues on its way playing back frames until the data is ready, at which point it starts the transition to the new transformation. What this means is that there can be, in theory, a variable amount of time between when the transformation is decided and when the transition actually starts.
At first, this wasn’t really a problem. Replays were being generated on Dolphin and played back on Dolphin with restricted disk speeds. Disk reads were consistent, causing the transition to start at the same time both during the recording and during playback. The problem started occurring for two primary reasons:
- Replays started being generated on console and mirroring was created. The disk load times on console were different than those on Dolphin.
- The “Speed up Disc Transfer Rate” options was enabled in Dolphin to help replays load faster. This meant that old replays recorded with this option disabled would now have transition desyncs. It’s also possible that different computers/hard drives might no longer play back replays consistently with this option enabled.
The solution that was decided on to solve these problems was to pre-load the transformations such that when it came to transition time, they could be started immediately. In order to do this, the next transformation is always loaded from disk at the start of the current transformation. This also meant that a new pre-load function had to be added to the start of the game to pre-load the first transformation. Keep this new function in mind because we will end up revisiting it later.
Unfortunately we couldn’t just roll out this change and celebrate. This is because we prioritize maintaining playback backwards-compatibility as much as possible. Had we just rolled this out, all old replays would have started desyncing far worse than before because they would be loading the wrong transformations given that the random calculation was now happening at a different time. In order to solve this, similar to UCF, we added a toggle to tell the playback engine whether to use the new pre-load methodology or not such that old replays would not be impacted.
Shy Guy Desyncs
Some time after releasing mirroring we started to notice some desyncs on Yoshi’s. I started seeing that the shy guys on the mirrored instance were different than on console. The desync was obvious in this case, but we didn’t know the cause.
We had all kinds of theories including RNG calculating incorrectly, Dolphin inexplicably skipping frames, and/or some mirroring specific issue. Eventually UnclePunch noticed something about a relevant section of the game code.
Here is what the above code does, in English:
- It rolls a random number between 0 and 5
- It checks to see if the number of Shy Guys spawned last time is equal to this number
- If the number is equal to the last Shy Guy count, it will go back to the start and roll another random number
Fine, makes sense. The game doesn’t want to spawn the same number of Shy Guys twice in a row. The interesting question though is: what happens on the first Shy Guy spawn?
What UnclePunch figured out is that the memory being accessed to get the last spawn count is actually uninitialized stage data. What that means is that while the block of memory has been allocated to store stage information, it still has the values of whatever happened to be there before. In other words, the “last spawn count” on the first spawn is pseudo-random; it’s dependent on things that happened in the menus. We refer to uninitialized data accessed by the game “garbage data”.
What would cause a desync then; is that the recording and playback would have different values for “last spawn count”. This meant if the first random number of Shy Guys happened to match the garbage data on playback or recording, one of them would re-roll, causing a different number of Shy Guys to spawn.
Additionally, Shy Guy re-spawns work by starting a timer once all of them are off-screen. So if the first spawn was wrong, all future spawns would also be wrong.
The cleanest solution for this desync was to just fix the bug — initialize the stage data by setting everything to zero. After this change, the “last spawn count” field on first spawn would always be zero during both recording and playback which meant that both would agree that neither would allow a spawn of zero Shy Guys as the first spawn.
Other Garbage Data
Shy Guy spawns are not the only thing affected by garbage data in vanilla Melee. Some other instances of garbage data impacting the game are actually fairly well known. The most prominent of these is the Luigi cyclone charge. For all the non-Luigi’s out there: it is possible for Luigi to start a game with his down-b already charged. It largely depends on selected characters, ports, and stage. The charge state is decided by the garbage data written to memory when selecting those things in the menus.
Another instance of garbage data is what are called “go mines”, discovered by taukhan. Go mines are shine mines that are pre-set somewhere on stage when the game starts, also determined by garbage data.
As mentioned in the previous section, the cleanest solution to these desyncs is to initialize the character/stage structs when the game starts with zeros. The impact of this, however, are the following known changes from vanilla:
- Luigi always starts the game with charged cyclone
- Spacies cannot start the game with a shine mine pre-set
- The first spawn of Shy Guys on Yoshi’s cannot be zero
PS Wrong Transformation Desync
Pokemon Stadium makes a triumphant return and strap in because this one is a doozy! We received reports of desyncs while mirroring on Pokemon Stadium. It was pretty clear from the footage that playback was not going to the same transformation as recording, but as always, the question was “whyyyy” but with more y’s this time because we’re sick of desyncs.
Once again we had all kinds of crazy theories, went down some rabbit holes, came back out, decided we hated rabbits, and eventually came up with theories that helped us progress.
At some point during Slippi development, UnclePunch modified the playback code to help debug desyncs. It allows the game to output a series of debug messages when the game state is mismatched. Assuming no desyncs, our re-sync logic will always overwrite the game state values with what they already are. When this is not true, we know we have a desync.
We ran some replays with this debugging tool enabled and in some replays we noticed something interesting: the RNG seed was desynced for blocks of time even though there were no visible desyncs. This should not happen considering the recording and playback instances should be running the same exact RNG function calls in the same order. We actually still don’t know why this happens, our best theory is that for some reason the particle effect calculations don’t always behave the same on console and on Dolphin. This may sometimes cause a different amount of RNG calls, desyncing the RNG seed.
That said, this should be fine! Particle effects happen quite late in the game engine loop and we restore RNG every frame, so as long as all important RNG calls happen between when we restore and when particle effects might desync the seed, we’re in the clear. Tasks get scheduled by the game engine according to a priority number such as “pri3”, in the future we will refer to this number to discuss when certain code is being run.
Player input processing, which is the location we restore RNG, runs at pri3. Particle effects run at pri15. This implies that we have “safe zones” where RNG calls will always be safe but we also have “unsafe zones”. In the unsafe zones, any RNG call is at risk of calculating differently than the original game.
Getting back to Pokemon Stadium, the logic that calculates the stage transformation runs at pri4, within the safe zone. But remember that because of the previous desync, we started pre-loading transformations. In doing so, we added a function at the start of the game that pre-loads the first transformation. That function ran at pri0, outside the safe zone.
And that is the reason for the desync. In the case where “particle effects” desync the RNG seed during the early frames of the game while first transformation is being pre-loaded, playback may send the game to a desynced transformation.
The fix that was decided on was to move the initial pre-load function into the safe zone, at pri4.
UCF + Fast-forwarding Desync
The concept of fast-forwarding was added to Slippi to allow for real-time mirroring to have very little delay. If ever the mirrored instance experienced a hiccup causing it to fall behind console, we could “fast-forward” to the most recent frame. Fast-forwarding works by selectively skipping the step that draws the frame. In other words, the game engine runs, inputs are processed, collision is checked, etc but then instead of drawing the frame and waiting, the next frame’s engine step is started immediately.
Interestingly, Melee sometimes does this without the help of mods. In vanilla Melee, when the game engine does a frame iteration, if there are two controller poll results in the buffer, it will process both frames immediately before displaying the frame. In other words, sometimes Melee will skip a frame and not display it at all. This happens every 1000 frames or so and it happens because controller polling happens at 60 Hz while the NTSC display frame rate is 59.94 Hz. This means that eventually, there will be two controller polls executed on a single frame.
With the original fast-forwarding code, raw controller inputs were not kept in sync with the amount of frames that had been processed. What this meant is that when UCF tried to check the two frames of analog x input discussed in the previous UCF section, it would not read the same values as the original game. Courtesy of taukhan, this issue was fixed and now the location where the x values get written should always match recording, regardless of fast forwarding.
Air Twirl Desync
First reported by Kadano, Marth in this clip is reminiscent of a figure skater. Turns out Melee does some weird stuff when deciding whether a character should be put into the air roll animation, DamageFlyRoll. The full requirements, according to UnclePunch, are as follows:
In other words, assuming a bunch of other conditions are true, there’s a 30% chance when knocked back that a character will enter the rolling animation instead of the normal knockback animation.
Turns out that the air roll code runs at pri1, outside the safe zone. That means that the RNG calculation that happens there is not safe. The only solution we could think of was to extend to safe zone. We added a restore and backup step to pri0 before anything else runs to extend the safe zone from pri0 to pri15.
Additional “Low Pri” RNG Desyncs
People don’t use Slippi with items on very often, so this one took a long time to show up. When we first saw it we thought maybe it was caused by the weird stages. It wasn’t until much later we realized what was actually happening.
Once we considered maybe it was the wrong items being spawned, it was easy to confirm, item spawns happen at pri2. Meaning that this desync is also fixed by having extended the safe zone.
Specifically what happened in the clip above is that the bunny hood should have been a warp star. It is worth noting here that most of Mango Axe during Summit actually did stayed synced. This gives us some sense as to how often we expect RNG to get desynced by the supposed particle effects.
This desync is believed to have happened for a similar reason. It turns out that the RNG that decides whether puff should do the “bob” animation that makes her grab-able by Sheik and Marth runs at pri1. We think that during recording, Puff bobbed at a different time and was actually grabbed.
Potentially Solved Desyncs
This section includes desyncs that may be solved but that we are not 100% sure about.
UCF Desync — n0ne edition
It’s true! This desync happened at Dreamhack Montreal and showcases n0ne getting different turn around behavior in game than during playback.
Afterwards I did a bunch of detective work with regards to what n0ne’s analog x values were during the desync and how UCF dashbacks are really calculated. A UCF dashback will happen under the following conditions:
- Is the current processed x input greater than 0.8?
- Has the current direction been held for less than 2 frames?
- Compare the analog x value from 2 frames ago to the value from this frame, is the abs value difference greater than 75?
During playback, all of these conditions pass, triggering a dashback. When the game was played on console, however, one of these conditions did not pass causing a tilt turn. We tried to determine which one it might have been and that did not amount too much. That is, until UnclePunch realized that the UCF code was doing some unsafe writing to the stack, a region in memory where local variables are stored. This data, required for the calculation, was at risk of getting overwritten by a hardware interrupt.
Interrupts on console can happen at any time as they are the result of an external action, usually disk reads. So for example, when music is loaded from disk and ready to be played, an interrupt will fire interrupting the current running code until it is handled. If the UCF code was interrupted at the wrong time, data it considered to be the “2 frames ago” value would get corrupted.
This was effectively a bug with UCF 0.73 and it is interesting to note that without Slippi, it may never have been fixed. Since then UnclePunch has fixed this issue with the UCF dashback code but it’s still unclear whether that was definitely what caused this dashback desync.
Link Spot Dodge Desync
This one may be the most baffling desync I’ve seen to date. We think that on console, the Link player held shield instead of spot dodged. This would have caused the stomp not to hit. Based on the audio and what happens during the re-sync, we think Link shielded the stomp and then up-b’d out of shield.
So why does the Link spot dodge during playback? We currently have one guess. It turns out that similar to the dashback UCF code discussed in the previous section, the shield drop code was also doing unsafe writes to the stack. Additionally, the shield drop code does run even when not on a platform.
Given this, it is plausible that some interrupt on console somehow kept Link from spot dodging when he really should have.
Unsolved Desyncs
This section is for desyncs that we still do not have a solution for or are not fully sure are solved. If you’re interested in helping to solve them, you can always join the development discussions on the Slippi Discord.
Widescreen PS desync
This desync sometimes happens when using the widescreen code and playing on Stadium. We think this desync is caused by some interaction with the jumbotron. It seems to display differently and for some reason the stage transition happens earlier on the widescreen version.
Dolphin Animation Freeze
The above clip is actually not from Slippi at all! It’s from FM 5.9. FM/Dolphin has long had a bug nobody understands where animations get frozen for no reason at all. We’ve slowly been accumulating clips of these instances and for the most part, the issue is mostly seen when on Netplay. Since Slippi is built on top of FM, this bug is still present in the Slippi build.
We have long believed that this bug could happen during playback as well, since the game is still running on Dolphin. So far, we have seen one clip ever that we believe might have been a desync caused by this bug.
Frankly we’re at a loss as to what might be causing this bug, let’s just chalk it up to cosmic rays? We hope that if we ever switch to mainline Dolphin, this bug will just magically go away.
Seek Desyncs
In the upcoming release of Slippi, it will be possible to skip around a replay by clicking on a seek bar under the game.
Seek works by continuously taking Dolphin save states at a fixed interval (15 seconds currently). To seek to a prior location, it will load the save state just prior and then fast-forward to the location.
Unfortunately it seems that loading save states sometimes doesn’t restore the state of the game perfectly and causes a desync on load. This will be something to improve on in the future.
Closing
If you managed to get through this read, congratulations! I really hope you enjoyed it. I’ve been wanting to tell the story of Slippi development for a while and this is only the beginning. There is much more to tell yet so if you like this sort of post let me know so I can gauge whether it’s worth writing more!
The majority of the work described in this post was done by myself and UnclePunch, though I can’t leave out the occasional support from metaconstruct and also taukhan. I’ve said this before but Slippi wouldn’t be where it is without the support of the Melee community at large. Huge thanks and much love to all Slippi contributors and supporters!
Twitch prime? I mean… patreon.com/fizzi36