Fixing the Loading in Myst IV: Revelation
I’ve always been a big fan of Myst. It’s a series of puzzle adventure games where you explore mysterious worlds by pointing and clicking the mouse where you want to go, solve puzzles, and discover their stories. Earlier this year, Cyan released a new remake of Riven, the sequel to Myst. Knowing this will likely reinvigorate interest in the several Myst sequels, I wanted to try my hand at fixing one of the most glaring issues with the fourth game in the series, Myst IV: Revelation.
Myst IV has a loading issue. Every time you click to navigate somewhere, it takes a solid two seconds to load — and this is not a compatibility bug; the game was always like this, even when it was newly released. This is especially noticeable coming off of the previous entry in the series, Myst III: Exile, a very similar game which doesn’t have any obvious loading throughout.
At the time, I saw this loading issue blamed on the fact that the game had to read assets from the DVD, which was too slow to keep up. However, for Myst’s 25th anniversary, Myst IV was released on Steam for the first time, allowing the game to be installed to a fast SSD — and yet, there is no significant speed improvement.
Because it was such an obvious problem with the game, I always assumed that someone would eventually dig into why it occurs and fix it. So, I waited. And then I waited some more. Finally, my friend Matt became interested in playing the game and didn’t want to put up with the loading times. So, as things tend to go, I decided: fine — I’ll do it myself.
And so began work on my new tool, Myst IV: Revolution! (clever name, right?) This is gonna be a long read — but that’s fine, I know Myst fans enjoy reading.
Quick Note
If you just want the tool I made to fix it and don’t care about how this was done, you can head to the Myst IV: Revolution GitHub to download it.
Learning From The Best
A few years ago, I read a blog post about how the loading times were fixed in GTA IV. In that blog, the author explains they used a tool called Luke Stackwalker to profile the game during its loading screen in order to discover that almost 70% of the loading time was being spent on JSON parsing. Since I also wanted to fix a loading bug, I figured why reinvent the wheel, so I decided to try the same approach.
First, in order to eliminate any potential red herrings, I decided to run a profile during normal gameplay — simply idling around, not loading anything. In the results, I saw that the majority of the time was being spent on WaitForSingleObject. Looking up in the stack reveals that it is called from the Direct3D Present method, used to draw graphics to the screen. I assume that it calls WaitForSingleObject
as a synchronization method, potentially for Vsync (but don’t quote me on that.)
So we know that WaitForSingleObject
is where the majority of CPU time should be spent during normal operation, and we can dismiss anything that appears in this first list as not the source of the problem. Now, I capture another profile while navigating.
I tried to time the start and end of the session so that it only catches the loading portion. This was somewhat nontrivial, since Luke Stackwalker doesn’t have a hotkey to stop recording, meaning I needed to Alt + Tab out of the game and click the stop button as quickly as possible immediately after the loading completed.
In this profile, we can see that approximately 50% of the time is spent on WaitForSingleObject
, but we know that is a part of the game’s normal rendering loop so we can dismiss it as background noise. However, there are some functions that we see in the loading profile that have shot up to 2nd and 3rd place, that were not even in the idling profile. Namely, L_GetBitmapRow
and L_PutBitmapRow
.
Bear in mind that I did not time this profile perfectly, so they likely take up more of the loading time than the roughly 12% shown, too. In fact, the profile is now no longer dominated by D3D9, instead being taken up by various modules with uppercase names, all beginning with “L.” So, what are these functions?
LEADTOOLS
As it turns out, these are image loading functions which are part of a middleware the game uses called LEADTOOLS. Lucky for us, LEADTOOLS is still around and we can still see the documentation for L_GetBitmapRow and L_PutBitmapRow on their current website. From this we can infer that the game is probably spending the majority of its loading time on loading images.
On the face of it, it makes sense. Image loading seems like the type of thing that could take a long time because it is a large amount of data to process. On the other hand, CPUs are much faster now than they were when Myst IV released, yet it was still causing a noticeable delay. So the question on my mind, then, was in the modern day of Threadrippers and Ryzens, what kind of throughput should I be expecting? Could it be improved? Was this as fast as I could reasonably ever expect to load however many images were required by the game, or was LEADTOOLS suboptimal in some way?
Look Up in The Stack
Before digging into the L_GetBitmapRow
and L_PutBitmapRow
methods to see if they had some obvious problem, I decided to first look up in the stack to get a sense for how the game intended to use them. So I looked where they were referenced in a decompiler, and found the following loop.
while ( row < imageInfo.height )
{
L_GetBitmapRow(bitmapHandlePointer, bufferPointer, row, bytes);
bufferPointer += stride;
++row;
}
By using the documentation, we can label the variables correctly. As we can see, this is a loop that loads the image row by row. This makes sense as LEADTOOLS does not appear to have a method to load an entire image at once, except to some particular formats (like a GDI bitmap handle) which come with some caveats.
From a design perspective, when creating an image library, it makes sense to only include a function to load an image row by row. If you only want to load a particular row or set of rows, it offers that capability, while being trivial to wrap into a loop to load an entire image.
However, anecdotally, I’ve noticed that this is a slow pattern. I think it is because it is antithetical to CPU caching — now everything needs to come to a screeching halt at the end of every single row to return back to another DLL, perform some checks, then jump back to the code where the actual memory copying happens. Last year when working on my ExportFile Xtra, I had to convert an image using a similar API, and passing the image data row by row was by far the slowest part of the process. I am not the first to observe that this is slow, either. Foreshadowing.
In any case, I decided to now look up from this function to see where it was called. Upon doing so, I was immediately intrigued by an interesting string comparison.
It’s a file extension check, and interestingly, the code seems to call a different function if the extension is “dds.” So, why did this set off alarm bells?
What Is DDS?
DDS is an image file format created by Microsoft. The name is short for DirectDraw Surface, and was originally part of the DirectDraw API, which has long been deprecated in favour of… well, no alternative really, but the important thing is that the DDS format was repurposed for use by Direct3D, the graphics rendering API for 3D games on Windows. The reason DDS is interesting is because it can be directly loaded by Direct3D without any processing — that is to say, you can basically just chuck it at the GPU, so they load very fast.
The tradeoff is that although they load faster, DDS files have a larger filesize than standard compressed image formats. This can be somewhat remedied by using DXT compression, which allows DDS files to be made smaller, though still not as small as a JPEG or PNG.
This lead me to wonder if the game had originally been written to use DDS, but at some point during development they realized that they would not fit on the DVD, and they had to write an adapter of sorts to use slower, more compressed formats. Of course, now that the game is on Steam there is no such size limitation. So I wanted to test this theory, but I am already getting ahead of myself. I first had to figure out where the game was trying to load these images from.
Proprietary Archives
If you happen to look in the install directory for Myst IV, you will see the files are split into three folders, “bin,” “data,” and “save.” “save” is just for savegames so that’s irrelevant for us. “bin” is short for binary, and contains the executable files for the game. The “data” folder is the most interesting to us at the moment. If you look inside, you will see several files with an M4B filetype extension. This is not a standard format — it’s a proprietary archive.
This is a pretty standard practice for games, even still now but especially in the 2000s. Rather than have the game files be sorted into folders, they are instead packed into an archive file — similar to a ZIP or RAR — though often proprietary and custom made for the specific game or engine, and therefore cannot be opened by anything else. There are a variety of advantages to this practice. In fact just last month Code Not Magic did a video about this subject (as it pertains to the Xbox, but it’s the same idea,) so if you’re not familiar with this concept you can just go watch that.
Prior Work
I’m not actually the first to have been in this precise situation. There are already a couple of tools which can use the M4B type files, such as myst4tools, a script to unpack and repack them, and MYSTER Asset Explorer, a viewer that can open M4B files to display the image assets within.
Before too long, I was able to extract the images, replace them with DDS files, and run a test to confirm that they were indeed recognized by the game. Notably, any uncompressed images were required to be 32-bit, not 24-bit as is common for colour images without alpha. However, I was planning on using DXT compression, so I probably wouldn’t need to worry about that, right?
Converting To DDS
The BigFile Format
Digging around in the decompiler revealed that the name of this archive file format is BigFile, meaning that M4B likely stands for Myst 4 BigFile. As far as I’m aware this format was not used again for any games afterward.
The format itself is fairly straightforward. It begins with the signature UBI_BF_SIG
and a version number. In Myst IV, the version number is 1. This is then immediately followed by the directories. Unlike, say, the ZIP format where there is a list of flat paths (like “path/to/file”) in the BigFile format there is a directory tree.
Each directory struct starts with the number of subdirectories they own, up to a maximum of 255, then for each subdirectory there is another directory struct, on and on. Then there is the number of files in the directory. Each file has a size and position in the archive.
Interestingly, names for directories and files are optional. Each name is a length followed by a string, but the length may be zero, indicating no name. In this case, the name is a wildcard. This feature is used for the top level directory of each BigFile, allowing the game to “mount” that BigFile’s directory so it can use any name it wants for it. For example, the data.m4b file is mounted at “gamedata,” and the game will recognize paths starting like “gamedata/path/to/file” as a path within data.m4b.
After the directory tree is the data for all the files themselves, and that’s it. Pretty basic stuff. There wasn’t any padding or anything more complex like that, just directories and files then data. So thankfully this format was fairly easy to read and write.
All of the sizes and positions in the BigFile format are 32-bit, which means there is an implicit 4 GB limit to the filesize of them. This was slightly concerning for my purpose because I knew I was going to be making the file larger, and I wasn’t sure by exactly how much. The input was 1.44 GB, so there was still plenty of slack, but I wasn’t sure how much of that was images and what exact filesize ratio to expect versus JPEG. I decided to work on the assumption it would be possible and to fix it down the road if a problem arose due to this.
There was one additional hurdle, and that’s the ZAP format. Many of the images in the game use a custom format specific to this game called ZAP, which consists of two JPEG images: one for colour, and another for alpha. The idea behind this is obvious: combining two JPEG images to get an alpha channel offers smaller filesizes than using a PNG, and from what I’ve heard this is a pretty common trick in old game engines. I didn’t really have to worry about this too much, as my friend Matt threw together a library called libzap to read them, and I just used that.
Easy Bonuses: Disabling the Fade Transition
At this point, the goal was clear. Read in the BigFile format, look through them for images, convert them to DDS and write them back, updating the sizes and positions in the directory tree. Naturally, this meant I needed to be able to read and write the BigFile format.
One additional advantage of this approach is that as it turns out, the archive files have settings in them in text files that can be easily manipulated. One such setting was the amount of time for the fade transition which plays every time you navigate. In Myst III: Exile, this setting can be adjusted in the options, but in Myst IV: Revelation it cannot. I personally prefer for this to be off entirely, so I threw in the ability to edit this amount of time.
Matt advised that I should write my program in C++ so that I could use NVIDIA Texture Tools 3 to create the DDS files. I was not aware of this, but as it turns out converting to DXT compressed DDS is actually quite CPU intensive. NVIDIA’s solution, called NVIDIA Texture Tools, uses CUDA in order to GPU accelerate the conversion.
The headers were for C++ and as far as I know there are no bindings for this for other languages, so this unfortunately meant I couldn’t build on the work of the prior extraction utilities, because myst4tools is written in Python, and MYSTER Asset Explorer in C#. Thankfully given the relative simplicity of the format, I was able to do this myself without too much effort. In the first iteration of my tool, the resulting file was about 3 GB, still under the 4 GB limit.
In Other News, Water Is Wet
So, I wrote the code to read and write the BigFile format. I wrote code to read in the images and convert them to DDS using NVIDIA Texture Tools. I even added multithreading so that the conversion time wouldn’t be unbearably long in the case you didn’t have an NVIDIA graphics card which supports CUDA. I ran it, I opened the game, and it worked — until I rode down an elevator, briefly saw this weird corruption, and then the game crashed.
Thankfully, the crashing function had a symbol, providing a clue as to the cause: WaterSlice::CopyToRaster
. In fact, something I haven’t mentioned up to this point is that the game is split across multiple DLLs, all of which export the majority of their functions. The whole game is also written in a pretty idiomatic C++ style. If one wanted to decompile an entire game, this would likely be an easy beginner project, however my goal was purely to fix the loading issue.
Anyway, as the name would imply, this crash was related to water. By inspecting the decompiled code the cause of the crash soon became clear. Throughout the game, there are lots of places that have a water effect with waves that move back and forth. In order to pull this effect, the game wants to directly edit the textures with water on them to basically “mosh them around.”
The code that does this assumes the image is in a 32-bit RGBA format, four bytes each dedicated to one channel. Normally LEADTOOLS would be used to decompress the images into this format, but since I’m loading a DDS it gets loaded directly in the DXT compressed format instead. The function was now directly manipulating compressed data, which was also in a much smaller buffer than it expected. This eventually meant it would manipulate data after the end of the buffer, causing a crash.
Well, if I didn’t DXT compress the textures, the resulting BigFile was way over the 4 GB limit — especially because the textures had to be 32-bit, not 24–bit, as previously mentioned. So that wasn’t going to cut it. I needed to know which specific textures had water on them and only convert those specific ones as uncompressed.
To accomplish this, I needed to become familiar with the various BIN files the game used. BIN is a generic file format, short for Binary, and basically whenever you see it you know it’s probably some ad-hoc custom thing made for that one game. This is no exception. Myst IV uses these BIN files to define various “resources,” which can be virtually any thing the game has. The hotspots, puzzle states, lighting effects, animations, and indeed the main thing I cared about: water.
I didn’t implement every resource type that existed, only the bare minimum required to be able to read the water resources. I found and reversed the functions that would read the BIN files for water, which eventually allowed me to access the data specifying where the water effect is meant to be on screen, using which I could ensure their corresponding textures were always converted uncompressed. This actually took quite some time as the format was somewhat convoluted, but at the end all I cared about was knowing which textures were affected.
After converting the water uncompressed, the filesize increased further to 3.43 GB, uncomfortably close to (but still within) the 4 GB limit. However, at this point I realized I should use DXT1 instead of DXT5 for any textures without alpha, as they are essentially the same format, but in DXT1 alpha can be turned off entirely. This shaved almost a GB off the size of the converted data.m4b, resulting in a 2.67 GB file.
It’s Still Slow
At this point, my theory that the loading issue was related to images was confirmed, because I could tell that movement was definitely much faster — but it still wasn’t enough. There was still what felt like about a second of delay. It was certainly faster than before, but I wasn’t satisfied. I wanted instant movement!
I knew that more could be done. Some of the images were not possible to convert. Firstly, there are the mask images. For certain effects such as lighting, mist, and so on, black and white mask images are used to indicate where the effect should be visible. These images are required to have a single 8-bit luminance channel, a format not supported by DDS (as of Direct3D 9 — later versions may have added support, but I didn’t have that luxury.)
Additionally, some of the aforementioned BIN files could contain images, and these images did not go down the same code path that checked for DDS, instead going directly down the LEADTOOLS route. So all of this leads back to a question I really should have answered earlier…
What Are L_GetBitmapRow and L_PutBitmapRow?
They’re just memcpy.
That’s right. Technically there are some checks at the beginning of the functions that allow them to load from sources other than memory such as a file on the disk, but as far as Myst IV is concerned, they just call memcpy and that’s it. That’s the main performance bottleneck.
At first glance, this seemed to me like bad news. memcpy isn’t exactly complex. If I really did need to copy this amount of data, how could I possibly do it any faster?
And then I realized… why copy the image at all?
For this purpose, LEADTOOLS is using an object oriented pattern where it doesn’t belong. It expects that you are going to load the image onto a “bitmap handle,” which must imply an internal, private buffer that it owns somewhere, that you must then copy the data off of in order to take ownership of it. This implies the need to first load the data to an inaccessible location and then perform a redundant memory copy before you can use it.
What really sucks about this is you need to copy the uncompressed data, which is much more data than if you only needed to copy the compressed image. The ideal would be to discard that bitmap handle and simply acquire ownership of that buffer, to do essentially the equivalent of a std::move
. But no, the only way to get it is to copy the whole buffer, one singular row at a time. It’s just not good enough.
To Be Continued…
This is getting quite long! I decided to split this blog into two parts, both to be released at the same time, just because I felt it needed a natural break. So, see the conclusion to this story in Part 2!