How to compress atlases in Unity at runtime — but not too much

Published in

MY.GAMES

12 min readMar 31, 2023

We’ll talk about our approach to get a compressed atlas while avoiding excessive pixel runtime compression. But to get there, we’ll explain atlases, simple compression techniques, we’ll compare techniques, and analyze results.

Hi, my name is Yuri Grachev, and I’m a programmer at MY.GAMES. In this article, we’ll focus on creating compressed texture atlases in Unity at runtime:

First, we’ll look at atlases in general. We’ll explain what they’re used for, and we’ll examine the constraints placed on the original textures.
We’ll then turn to the simplest technique to compile an atlas at runtime and evaluate the income from a technical standpoint.
After that, we’ll discuss our runtime compression experiments.
Finally, we’ll look at what various image compression techniques have in common.

After this, we’ll get to the primary motivation behind this article — our alternative approach, where you’ll be able to get a compressed atlas while completely avoiding excessive pixel runtime compression.

With that introduction out of the way, let’s jump in!

In our project each character is made up of a collection of eight skinned meshes that cannot be batched out of the box in Unity. Each mesh has two distinct textures: diffuse and normal. On top of that, each character has two interchangeable weapons, as well as at least one additional renderer.

As a result, each character in the game is rendered with at least 18 draw calls, where 9 are used for the main frame and 9 — for shadow maps. We get up to 144 draw calls in total — and this is just for characters! Players have access to hundreds of replaceable pieces of equipment.

Atlases: what they are, why we need them

Since we support iPhone 6, (we even aimed for 5s at the beginning of development) it was important for us to get rid of excessive draw calls. Usually, for our projects on weak devices, this comes down to the CPU, which puts these same draw calls in the command queue, and not the GPU, which then executes them.

To reduce the number of draw calls, we manually integrate the geometry of the equipment elements that make up our character into a single mesh. And, in order for this to make sense, you must integrate, not only the geometry, but also the textures; this is done so that you can then use a single material with a single set of textures.

This is where atlases come into play: without them, even after integrating geometry, we’d still have to draw equipment elements with separate draw calls and switch textures between them.

Atlases are often manually created by artists during the production of static content, but we want to accomplish this at runtime since our characters are dynamically formed by players from predetermined elements.

When you first start working with atlases, you must keep in mind the requirements and limitations that were placed on the original textures:

We’re forced to deliver the original textures to a user’s device in a compressed format, otherwise they’d take up too much space in the end device’s memory.
If a material has two texture slots addressed by the same set of UV coordinates, we must ensure that the proportions of the corresponding textures are the same — otherwise, one of the atlases may not assemble correctly and/or correspond to the second, and the original textures in the same atlas may differ in quality when upscaling or downscaling.

Every time we put together an atlas, it’s also crucial to keep color bleeding in mind. Below is an example with bilinear filtering enabled and disabled:

As you can see, when filtering is activated, texture boundaries within the atlas begin to blur, and colors within textures are blended because bilinear filtering interpolates adjoining pixels on the border of two textures. You can easily fix this by adding a margin inside the texture for the UV shell; it should not be near texture borders.

Crude texture atlas implementation

Now that we’ve dealt with the original textures, let’s try to merge them and go over the easiest way to do it:

Take a texture pack
Create a layout of these textures within the atlas — we can also use Texture2D.GenerateAtlas method
Create an ARGB32 RenderTexture
Assemble our textures into an atlas corresponding to the prepared layout
Correct the UV coordinates of our combined geometry
Produce an assembled character

The result is visually identical to that of a separate group of meshes and textures, but from a technological standpoint, the character becomes a single mesh that is drawn in a single draw call.

I ran a test scene with and without combining meshes to visualize how this affects performance. We received the following results:

The number of visible meshes has been reduced multiple times, and the number of batches has been almost halved. The amount of time spent on the render thread has also been reduced.

The generated ARGB32 render texture consumes a significant amount of memory (A LOT of memory). You can, of course, reduce the resolution to use less memory, but you will lose image detail. However, this texture can be of any proportions and size, has wide support, and works anywhere.

It should be noted that not all textures can be assembled into an atlas with this method. Problems can arise when trying to merge raw textures with color-encoded data. Reinterpreting the color will most certainly result in the inability to decode the data again. But the same color reinterpretation allows building source textures of any format into an atlas. That is, you can add mixed textures to an atlas.

Nonetheless, the amount of memory used by an atlas like this and the relationship between this amount and resolution outweighs anything else. So, seeing that such an outcome does not serve us very well, we began to consider alternative choices. The obvious first approach was to try runtime compression.

Runtime compression

First, we found a library called Unity.PVRTC on GitHub and played around with it a little. The library worked right out of the box, but it was very slow. It was immediately evident from the source code that the library was really unfinished. We had to rewrite it a lot, even using Burst and Unity Jobs. In any case, as a result, we were able to lower the compression time for a single 2K texture on iPhone 6 from 4s to 220ms.

Ironically, this was still not enough. The producers were dissatisfied since, as a result of using ARGB32 atlases and this runtime compression, we increased the total mission start time by several seconds, which had a negative impact on UX.

In addition, we planned to support Player backfill, where a new player joins a game session in progress. To replace the “lost” character with a new one, the feature required the same compression to be performed in the middle of the game session on each user’s device.

Further, among other things, the library had relatively weak heuristics for selecting reference colors (i.e., head-on), which resulted in low compression quality. It was also critical that we send textures to player devices in a compressed format, and we created an ARGB32 atlas from them after that, which then went through the compression procedure in runtime. As a result, the original textures were compressed twice, which compounded the errors and artifacts.

So, after experimenting with this library, we continued to look for ways to create a standard atlas. Eventually, we thought: what if we examine compression algorithms from the other side? We had the idea to delve deeper into the intricacies of several compression algorithms such as ASTC, PVRTC, ETC, and BC (DXT). We hoped to find some clues on how to implement runtime compression more efficiently. And so, we began our investigation.

Various compression algorithms

All of the above, ASTC, PVRTC, ETC, and BC (DXT), are all formats that work with blocks of pixels or packages. Each block is encoded into one or two 64-bit values (long/int64), and all memory blocks are linear and line by line for all formats, except PVRTC, which uses the Z-order (Morton curve). MIPs in all formats (including PVRTC) are also linear sorted from largest to smallest texture.

Let’s see what a block of pixels is using the example of DXT1/BC1:

The image is segmented into 44 equal squares and two reference colors are chosen from these 16 pixels and encoded in 16 bits each. In addition to these two reference colors, an index matrix is formed, which makes it possible to obtain all 16 pixels from them with some approximation.

As previously stated, these blocks can be found either linearly or in the Z-order as follows:

The distinction (and likely an advantage) of PVRTC in this case is that employing Z-order increases the locality of the data region in the processor cache, resulting in more cache hits than cache misses when working with an image area, which is normally two-dimensional, rather than one-dimensional. That is, there are far fewer scenarios in which a row of pixels/blocks is required than a rectangular section of the same data.

With this information, we attempted to construct an atlas from these blocks by simply rearranging them in memory. The block nature of this data, as well as its independence from other blocks, helped us here: to balance it out, these blocks could be read as regular longs (or pairs of longs).

Our PVRTC atlas implementation

To get everything really off the ground, we needed to add a few more requirements for the initial textures:

First, the textures must be square to the power of 2, because our layout technique is fairly complicated, and Unity does not construct level MIP if the texture is not to the power of 2
Second, all source textures must be imported with the same settings. This is due to the fact that joining blocks in the atlas in this manner is only possible when the incoming data is uniform
Third, we only supported 4x4 and 8x8 ASTC block sizes. Our algorithm for arranging textures in the atlas was crucial in this case. However, the fundamental issue was an unwillingness to engage with all kinds of borders. (After all, while using ASTC 10×10, textures with a power of 2 cannot be completely divisible by block size. As a result, ASTC blocks linger around the texture’s edge, only partially filled with relevant data. It’s unclear what to do with them. Ideally, textures should have been compressed excessively, which we were trying to avoid.)
Finally, enable the Read/Write Enabled checkboxes in the importer of all source textures so that we can access the pixels on the CPU side.

Let’s take a look at how to create such atlas using pseudocode as an example.

We have a function that takes a set of initial textures, format, and layout as input. Within the function, we build Texture2D of the required size and format with MIP support:

public static Texture2D GenerateAtlas(Texture2D[] sources,
                                      TextureFormat format,
                                      Layout layout)
{
  var atlas = new Texture2D(4096, 4096, format, mipChain: true,
                            linear: false);

I would like to note that Texture2D is specifically created here, rather than RenderTexture, as in the crude implementation.

We then use the general GetRawTextureData function to access the memory space containing the pixels of this texture, using long as the data type:

NativeArray<long> atlasData = atlas.GetRawTextureData<long>();

You can now add blocks to this array. Here, we look through all our source textures, obtaining references to the corresponding block arrays:

for (int srcIndex = 0; srcIndex < sources.Length; ++srcIndex)
{
  var source = sources[srcIndex];
  NativeArray<long> sourceData = source.GetRawTextureData<long>();

We calculate offsets and copy blocks with the original textures into the array of blocks in our atlas:

Rect sourceRect = layout.GetRect(srcIndex);

for (int mip = 0; mip < source.mipmapCount; ++mip)
{
  MemoryRect memRect = GetMemoryRect(format, 4096, 4096, sourceRect,
                                     source.width, source.height, mip);
  CopyMemoryData(sourceData, atlasData, format, memRect);
}

Here, Rect specifies the location of a separate texture in the atlas. MemoryRect is the object in charge of calculating all offsets, sizes, indents, and steps.

For example, with a linear block layout, the function might look like this:

public static void CopyMemoryDataLinear(NativeArray<long> source,
                                        NativeArray<long> destination,
                                        MemoryRect memRect)
{
  for (int y = 0; y < memRect.blocksY; ++y)
  for (int x = 0; x < memRect.blocksX; ++x)
  {
    int srcOffset = memRect.GetSliceOffsetSrc(x, y);
    int dstOffset = memRect.GetSliceOffsetDst(x, y);
    destination[dstOffset] = source[srcOffset];
  }
}

Finally, be sure to execute the Apply method, which will apply the downloaded data to the graphics API:

atlas.Apply();

The complete code:

public static Texture2D GenerateAtlas(Texture2D[] sources, TextureFormat format, Layout layout)
{
  var atlas = new Texture2D(4096, 4096, format, mipChain: true, linear: false);
  NativeArray<long> atlasData = atlas.GetRawTextureData<long>();

  for (int srcIndex = 0; srcIndex < sources.Length; ++srcIndex)
  {
    var source = sources[srcIndex];
    NativeArray<long> sourceData = source.GetRawTextureData<long>();

    Rect sourceRect = layout.GetRect(srcIndex);

    for (int mip = 0; mip < source.mipmapCount; ++mip)
    {
      MemoryRect memRect = GetMemoryRect(format, 4096, 4096, sourceRect, source.width, source.height, mip);
      CopyMemoryData(sourceData, atlasData, format, memRect);
    }
  }

  atlas.Apply();
  return atlas;
}

public static void CopyMemoryDataLinear(NativeArray<long> source, NativeArray<long> destination, MemoryRect memRect)
{
  for (int y = 0; y < memRect.blocksY; ++y)
  for (int x = 0; x < memRect.blocksX; ++x)
  {
    int srcOffset = memRect.GetSliceOffsetSrc(x, y);
    int dstOffset = memRect.GetSliceOffsetDst(x, y);
    destination[dstOffset] = source[srcOffset];
  }
}

If you are certain that no more textures will fit in the atlas, or if you have logically finished adding textures to this atlas, then call the Apply method with an additional parameter:

atlas.Apply(false, makeNoLongerReadable: true);

As a result, the texture of this atlas will be loaded into video memory while simultaneously destroying the copy in system memory, saving exactly half of the RAM otherwise needed.

When we run this code, we get a composite Texture2D with the same output format as the original textures.

Here are some of the benefits of this solution:

We get atlases with higher resolutions
Atlases use significantly less memory space per pixel
We get rid of double compression artifacts
There is no bleeding inside MIPs (this can happen if MIPs are created based on an already prepared atlas)

In terms of cons, there are pretty rigorous standards for the initial textures, which I mentioned above.

Now let’s look at the difference between the two atlases produced using different approaches:

Because the original textures of the character did not fit into 2K, our atlas has a resolution of 4K. However, as you can see, it weighs slightly more than the crude ARGB32 atlas. That said, the high resolution ultimately benefits us, which I will discuss in more detail later. You can check the ratio of resolutions here to determine the potential quality difference.

We can test the approach’s accuracy by comparing our updated version of the atlas to the crude implementation:

Just one more thing…

We tried to combine several characters into one atlas at once and create a “page” implementation. (This is because 4K atlases 4K have a large amount of empty space.)

To accomplish this, we must first collect all textures that need to be added to the atlas from all characters. The textures are then divided into groups, where each one should fit into a single 4K texture. At the same time, a simple rule must be followed: each character must fit entirely on one page, otherwise it will be fully transferred to a new page. Duplicate textures can be reused using this method if the rest of the character’s textures are on the same page.

According to our estimates, we should have seen no more than 3–4 pages per atlas in the worst-case scenario, but the reality exceeded our expectations: we never saw more than two pages for all characters in a scene.

*Example showing how multiple characters are placed on a page*

The end results

What was the result of using this system for combining textures into atlases?

Let’s look at PVRTC/iOS as an example. Our atlases take up a total of 21 MB of memory, compared to the previous 46 MB for ARGB32 atlases. The time required to generate two PVRTC pages has been reduced to 70ms, as opposed to 8×220ms spent on compression alone (excluding ARGB32 texture render preparation). Textures now have higher resolution, they are no longer compressed by any double compression, and they can now be reused — that is, we can get rid of some of the duplicates in the video memory.

So, with that, we’ve accomplished everything we’ve set out to touch upon in this article. Hopefully you found it enlightening and useful. Take care, and thanks for reading!