One of the things I like the most about working with game technology is solving problems that are interesting and challenging, which is great when working at Wildlife since our teammates from the art department are experts in coming up with these kinds of tasks.
In today’s post, we take a look at the graphics programming and technical art challenges we faced when developing the bushes from ZOOBA, covering the core steps and techniques used since the visual description from the game’s art director to how it’s implemented on the current game.
A visibility problem
Being a mix of MOBA and Battle Royale, ZOOBA is a game where both strategy and positioning are keys to victory. For a player to succeed in the battlefield, hiding is one of the main mechanics available. Enters ZOOBA’s bushes.
Figure 1 illustrates one of the first concepts for the bushes’ visuals, consisting of simple and plain shapes, with small details to break the silhouette. The current version follows a slightly different art direction. Still, it keeps the very same elements, which makes it easy for us to prototype the visibility mechanics using simple geometric shapes like boxes and cylinders (please enjoy my programmer art :D):
This friendly aesthetics may make the visibility task seem pretty simple, and in fact, the core gameplay part is pretty straight forward:
- When a player is inside a bush, it’s invisible to enemies outside the bush;
- When both player and enemy are inside the bush, the player can see the enemy when the latter is within the player’s view range (also referred to as Field of View, or FoV);
Things start to get interesting when we have to visually convey these points in an easy to read, lightweight, and preferably visually appealing manner.
The first problem, displaying what the bush area (i.e., where the player can hide) is doesn’t seem to be a problem at all; after all, we could just render the bush, and we are done. The catch here is Ollie, our panda character that can eat the bushes and effectively remove a hiding spot from the match, as shown in Figure 3.
The second problem, rendering the player’s view range, also seems quite simple, but has some tricky corner cases such as visibility leaking and interaction with the panda bites:
Figure 4 shows most corner cases and necessities that our game designers have when thinking of how the player will experience the game in many different scenarios. It also exemplifies how tricky this can be to implement so that the visuals correctly translate into gameplay: the “Leak!” part shows an area inside the player’s view range that overlaps with another bush, meaning we must not reveal this area in the final render.
These were the major problems we had to overcome when designing a rendering solution for the bushes. And then it was...
Time to get things done
A few techniques were tested to accomplish all of the game design necessities, but only two reached the users’ hands:
- A dynamic heightmap combined with fragment discard operations;
- A ZCarving system using the stencil buffer;
The heightmap solution was the one we used at the time of release, and we had several reasons for that: first and most importantly, we were bound to a rendering pipeline that was already making use of this heightmap texture for other effects. Because of that it was simpler and faster just to reuse the logic; It was also supported across all levels of hardware and produced a good enough result for our first iteration of this game mechanic.
The technique consisted of a separate framebuffer to draw some information about the scene by gathering all visible relevant geometry (i.e., bushes, players FoVs, and panda bites) and then projecting some associated proxy geometry (i.e., simplified versions of bushes and FoVs) into the horizontal plane.
These projections effectively create heightmaps containing some flags per projected pixel. The textures then were used to discard pixels when rendering the real geometry, mainly by height comparisons.
You may find it strange to project all the information into a plane instead of using screen space coordinates to do all this work, and we would agree with you. Still, some context is necessary: at first, we ported an existing rendering pipeline designed to be used in a top-down view 2D game, and ZOOBA is not that different from a top-down 2D game.
Not only that, but it was a relatively cheap way to store and use illumination info, making it possible to use several point lights and occluders just by rendering a proxy geometry into the heightmap.
These perks seemed very interesting to ZOOBA at the time, until problems started to show up:
The first issue with the technique is aliasing, clearly seen in Figure 5. This kind of artifact is common when using textures to store projected information in rendering, as you have probably seen in games using classic shadow mapping techniques.
One of the solutions for the aliasing is simply to increase the heightmap resolution. Still, it comes at the cost of both memory and performance, which leads to our next problems: memory usage and data transfer, and both are aggravated when developing for mobile platforms.
Memory usage should be clear: we are storing a buffer containing all this information, and we are required to do so since the feature is part of the core game mechanic, meaning we cannot just disable it for low-end devices. The good news then is that we can increase the heightmap resolution for better devices; The bad news is that low-end devices still have the issue, and even on high-end devices, it’s possible for users to choose in which quality they want to play.
Data transferring might not seem like a problem, since all the data is on GPU memory and is both generated and consumed by the GPU itself. That would be the case if we were developing for game consoles or PCs, but the tiled architecture from mobile GPUs makes it the real problem here.
Mobile hardware naturally has less physical space when compared to PCs and consoles, and it translates to a need for an architecture that uses its resources much more efficiently than the classic immediate rendering mode. The tiled rendering is then used to reduce the data transferring by dividing the image in several chunks, which enables many optimizations such as per tile depth tests and “memoryless” textures.
The heightmap based technique requires us to follow the pipeline in Figure 6. This approach is not recommended for the tiled architecture because it forces the GPU to transfer data many times from main memory to tile memory, which not only is bad for performance due to the reduced bandwidth from mobile hardware, but it also heats the device and increases the battery consumption, or in a nutshell: we don’t want it to happen.
With all that in mind, the idea was to develop a solution that made better use of the hardware while also eliminating the artifacts observed, and our render pipeline was becoming obsolete and cumbersome to use, which made us realize that it was…
Time to make things better
Since memory bandwidth is very limited in mobile, the pipeline from Figure 7 seems good, once it completely removes the need for load and store operations, but then there is the question of how to remove these operations.
A solution could be to use “memoryless” textures, buffers that exist only in tile memory and are recreated and discarded every frame, and use the same algorithm, but there are a few issues with that:
- Many devices don’t support this kind of texture, and even with the ones that support it, it doesn’t mean we can access them (some older iPhones are an example of this behavior);
- Other parts of the game needed the information from the heightmap texture, so in the old rendering pipeline we had to save the texture regardless of the device used;
The first point is a dealbreaker for us due to the platforms and devices we target. Bear in mind though that this kind of texture is not useless: memoryless textures are the reason why deferred rendering is becoming a reasonable approach to mobile.
The second point was addressed by us taking down our old render pipeline, meaning there would be no good reason to keep using this technique in the first place.
So, we are now using an algorithm based on a deprecated pipeline, and on top of that, memoryless textures don’t help us solve our problem. Or do they?
Nope, they don’t! At least not with our current algorithm, but they do give us a hint of where to aim. You see, even if a device does not provide a custom tile memory buffer, there is still one kind of texture that all of our target devices support: the depth texture, or, more specifically, the depth + stencil buffer. We will get back to this soon, but first, let’s take a look at what inspired our current technique.
If you know anything about Constructive Solid Geometry (CSG), both Figures 3 and 4 from earlier may resemble the subtraction operator from CSG, and that’s precisely where we went to find our final solution.
For those unfamiliar, CSG is a rendering technique where we create geometric forms by aggregating other, often simpler forms. The diagram below illustrates the concept:
When we take a look back at Figures 3 and 4, the player FoVs and panda bites behave very similarly to the last step of the tree in Figure 8, meaning we are carving the bush geometry with both player’s FoVs and panda’s bites. Hence, it seemed like an excellent approach to take, and it’s exactly where we started our second algorithm.
Ok, but how does this CSG thing help us? Well, for starters, CSG is a pixel-perfect technique, meaning we would get rid of the artifacts presented in Figure 5. But there is more: our method is based on the work from Stewart et al. , in which the authors manipulate both the depth and stencil buffers to implement the CSG algorithm, and guess what? Our game is already using these buffers, so we don’t have to use any additional textures :D
Before we continue, it’s good to know that there are many details about ZOOBA’s final implementation of the CSG subtraction technique. Still, since they are not as involved as what is being presented now and are also very specific to our game, we chose to cover just the core part this time, which should be transferable to other games and hopefully improved in the future.
With that out of the way, we now have a reference algorithm and all the resources we need, so let’s get started.
Requirements for a subtraction
From the CSG algorithm, all we really need is the subtraction operation, which makes it even simpler to adapt the paper to our needs (and we are thankful for that), but since nothing is perfect there are a few other issues to address:
First, the paper implements an incremental algorithm, meaning that for the example in Figure 9, it would first subtract one of the cylinders from the box, generating an intermediate shape, and then subtract the other cylinder from the intermediate shape to get to the final one. The issue is that we can’t batch our work, which is not a huge deal, but our previous algorithm was capable of doing so, and it’s a good property to keep. Now we have the first constraint of our approach.
Second, the method requires us to process the cutting geometry in a front-to-back order to produce the correct result. This is not all bad, except that we are using Unity and don’t have much control over how the objects are rendered (explained below), and then a few artifacts would occur with the vanilla algorithm.
The issue with the ordering comes from us using Unity 2018 for development. In that version, using the built-in forward rendering pipeline, all we can do is to tell in which render queue an object goes, and then Unity will decide on how to order that object based on that queue. So we just have to choose a queue that performs a front-to-back ordering, right?
The problem is that Unity’s ordering can get quite inconsistent when two or more objects are at the same distance from the camera, resulting in geometry popping in those cases.
Notice that this inconsistency is not a problem for standard opaque geometry since even if you change the order of two opaque objects, the final image will still be the same, only producing more overdraw.
As a final point in the ordering issue, we know that with Unity’s SRP the geometry ordering is no longer a black box. Still, we haven’t used it with ZOOBA yet, so we saw two alternatives:
- Making the ordering ourselves;
- Design our solution taking Unity’s ordering inconsistencies into account;
We chose the second option, as the first one would make everything harder to maintain, debug, and develop. Besides, since SRP is something that we will probably take a look in the future, we preferred to keep our approach as close to Unity’s way of doing things as possible.
Adapting the paper
The base of the subtraction operation presented in the paper is a smart trick called the Parity Test, a way of determining which part of a convex shape is inside another curved shape. Having that in mind, cutting the original mesh is as simple as rendering the cutting mesh by using a ZTest Greater and Front Face Culling. Here are the steps that the original algorithm take to perform a subtraction, assuming the code is written in Unity:
- Draw original mesh (mesh to be cut) writing into the depth buffer;
- Draw the cutting mesh FRONT faces, flip the parity stencil bit (explained below), and DO NOT write into the depth buffer (you don’t need to write to the color buffer here);
- Draw the cutting mesh BACK faces, flip the parity stencil bit again, and DO NOT write into the depth buffer (you don’t need to write to the color buffer here);
- Draw the cutting mesh BACK faces again ONLY where the parity stencil bit is 1, now using ZTest Greater, and this time write into both depth and color buffers;
Here the algorithm assumes that the stencil buffer begins with the parity stencil bit, any of the eight available stencil bits, with value 0 (so it may be a good idea to write it back to 0 in step 4).
Figure 10 illustrates the technique and should make it easier to see why it works. Parts of the mesh that are visible from the camera angle AND are inside the original mesh will fail the depth test on step 3. Then the bit that the front face has written will continue to be 1, whether the parts that do pass the depth test will just flip the parity bit, turning (or flipping) it back to 0.
The main goal of step 3 is to define our region of interest, which step 4 simply uses to cut the original mesh, giving us the desired result.
Another thing to notice here is that the Parity Test is the reason why the algorithm only works with convex meshes. Also is why it proceeds iteratively through the cutters, as if we batched the cutters, we would effectively be creating a concave mesh.
If the previous paragraph is not clear, suffice to know that the Parity Test uses the property that a given convex cutter can only write into a pixel twice tops. On the other hand, a concave mesh can write into a pixel many more times, all depending on the mesh topology and view angle.
To simplify the comparison when we introduce our modifications, here is a table of the process using Unity’s ShaderLab syntax. The steps are the same as before, and are taken from left to right in lockstep (although the 2nd and 3rd steps can be merged into a single pass for free):
With that in mind, our first and perhaps most significant change to the algorithm is to make it work with concave cutters. Still, it is not a trivial task to perform, especially when dealing with generic meshes.
What we do then is to take advantage of ZOOBA’s bushes geometry and “skip” the parity test all along.
Since we know that our player FoVs and panda bites are only used inside the bushes, and both are cylinders (see Figure 11), we know that the shape of the region of interest is precisely the top cap of that cylinder, meaning we can just go ahead and use it. Not only that, but the camera in ZOOBA is also always at a fixed angle, making it easier to specialize in the algorithm.
With that in mind, our first modifications to the subtraction are the following:
The first three steps are the equivalent to the original algorithm, and the logic is straight forward:
- Draw original mesh, marking a stencil flag to represent the bush area;
- Draw ALL the cutter caps that are in the bush area (stencil == 128), incrementing the stencil buffer and effectively marking the area of interest;
- Draw ALL the cutter insides, decrementing the counter in both ZFail and stencil Pass;
For a single cutter, the algorithm behaves pretty much the same way as the original, using bit 1 as the parity bit, the only difference is that we also mark the bush area to prevent leaking.
For more than a single cutter, the area of interest is anywhere with a stencil value greater than 128, and the ZFail operation in step 3, combined with the last two steps, exist only to handle those cases, as shown in the lower row of Figure 12. More specifically, the ZFail part handles the inconsistencies in Unity’s ordering algorithm, preventing meshes from popping when the camera moves.
Both 4th and 5th steps are used to fix intersections between the cutters inside the original mesh. This is just an approximation given the camera angle and gameplay scenarios that we have in ZOOBA. However, if we wished to solve this problem for any camera angles and meshes, we would have to repeat these two last steps as many times as there were back faces covering a given pixel (Figure 13 should make things simpler to understand).
The last paragraph also shows the reason why making the CSG algorithm work for meshes that are not convex is so tricky (a general solution is really a chicken and egg situation), so our efforts to improve on this stopped when we fulfilled our game needs.
And if you are tired of my questionable drawings by now, we present the results of this core algorithm, a pretty nice pixel perfect blend between the cutters, again featuring good old programmer art, but now in 3D:
All of the above is rendered using only five draw calls “regardless” of how many cutters there are in the scene, while the original algorithm adds two draw calls to each new cutter. There are quotes in “regardless” because there is a limit on how many cutters can change a given pixel (127 to be exact), and the batching is done by Unity, so the engine limitations apply.
With that said, our cutter geometries are so simple that Unity has no problem batching them, which keeps the number of draw calls at a constant minimum. Besides, keep in mind that this is a simplification of our algorithm, which makes the number of draw calls a little higher than that in the real game, and even then, this number remains constant, which is essential to avoid lag spikes.
Note that in this implementation, four of the five batches are specific to the cutter, and to render both FoVs and bites, we would need nine batches plus a slight adaptation on the stencil logic to handle the corner cases. These changes are not so challenging to come up with, but since they are very specific to how ZOOBA uses the CSG subtraction, we’re good for now.
Apart from that, we presented a logic that writes into the bit 128 to mark the region as a bush, but in the real game, we use a combination of bits to mark this region. We do this to prevent the visibility leaking pictured in Figure 4 when many bushes are being affected by the technique, and the reason we used 128 as the flag here was to keep things as generic and straightforward as possible.
Advantages of our approach
The method above, like the one from the CSG paper, works by making computations in screen space and using the stencil buffer. This is great for mobile platforms because, as mentioned before, the processing is done entirely on the tile memory, improving performance, battery consumption, and memory usage. Not only that, since depth and stencil buffers usually are entirely recreated every frame, it’s also common to use those as memoryless textures. With this method, we get this benefit for free.
Since all the logic is done using stencil and depth operations, any shader that is not dependent on these operations can be used to produce the final image look. As examples of that in the final game, we have:
- The bite-related shaders use just a simple texture lookup;
- The FoV related ones use a scrolling texture lookup;
- The details use a texture lookup with alpha to discard pixels;
Nothing fancy, but the results are pretty, in my opinion.
Another unexpected improvement is that we significantly reduced the number of shader variants because all of our bush related code stays in a single shader file. It uses a minimal amount of keywords, mostly to handle instancing behavior and very specific configurations. This reduction in shader variants benefits build time, build size, and in-game shader compilation, which is a common cause of lag spikes in many games.
And last but not least, the algorithm solves the visual problems found in Figure 5, resulting in:
The most noticeable improvement is on the aliasing from the bite. And speaking of aliasing, since we are performing the whole logic on the screen’s framebuffer, we also get MSAA for free (if the game is using MSAA, of course, which fortunately is the case for ZOOBA).
Some other examples of artifacts from Figure 5 solved here are the FoV leaks into the bite area and a thin line between the player’s FoV inside the bush and its companion moving grass blades (this one pissed off our art director for a long time).
Wow, that was a long post! Thank you for reading, and I hope you’ve had a good time (and maybe even learned something today, who knows). Also, feel free to use and improve over our algorithm, just remember to share the improvements with the community :D
Any feedback or suggestion is much appreciated!
See you soon :)
 N. Stewart, G. Leach, S. John, “An Improved Z-Buffer CSG Rendering Algorithm,” July 1999