GPU Run-time Procedural Placement on Terrain

Kacper Szwajka
7 min readJan 22, 2024

--

In this article, I will explain the method for implementing run-time GPU placement of objects on terrain, using Unity engine. This technique may be adapted to any engine supporting compute shaders

Problem: Manually placing objects over large terrains, spanning several square kilometers, can be a daunting task during game development. This becomes particularly challenging when frequent iterations and adjustments are necessary. Moreover, consider the memory consumption: populating a 2x2 km map with 0.5 m density grass results in 16 million instances, requiring approximately 1024 MB just for storing their transformation data. This is where procedural placement becomes invaluable.

Procedural Placement: A Solution

Procedural placement offers a solution by allocating memory only for rendering instances within the camera’s range and dynamically scattering them as the camera moves. This approach draws inspiration from techniques used by Guerilla Games in “Horizon: Zero Dawn,” with adjustments tailored to my project’s specific needs.

Process Overview

The process involves creating a flat grid around the camera to represent ‘chunks’. Each cell in this grid, or chunk, contains pointers that will become object instances. As the camera moves, we only update chunks that transition from behind to front. Subsequently, we reshuffle pointers in these ‘dirty’ chunks and assign prototypes for each pointer. The final step is setting up pointer transforms based on the selected prototype parameters.

Most of the logic for this system is executed using Compute Shaders, with numerous indirect dispatches ensuring that only necessary data is updated.

Foot Print & Ecotype

The concept of a ‘footprint’ combined with ‘ecotypes’ is central to this system. A footprint defines the average distance between pointers and dictates the algorithm’s response for placing these pointers. It also includes a list of ecotypes, which are essentially groups of available prototypes with specific placement rules.

Chunks

Chunk logic is primarily handled through compute shaders. Each chunk roughly stores 512 pointers. Found this number to be the most efficient for computation per chunk. The grid size of these chunks is dependent on the maximum level of detail (LOD) distance from an ecotype. It’s crucial to avoid noticeable ‘popping’ when one chunk supersedes another, as demonstrated in the following gif:

Offseting chunks. Notice how chunks indexes change.
[numthreads(GPUI_THREADS, 1, 1)]
void ChunkPositionUpdate (uint3 id : SV_DispatchThreadID) {
if (id.x >= maxInstances) return;

Chunk chunk = chunks[id.x];
int2 spatial = chunk.spatialPos;

if (spatial.x < newGridMin.x || spatial.x > newGridMax.x || spatial.y < newGridMin.y || spatial.y > newGridMax.y) {
chunk.spatialPos = newGridMax - (chunk.spatialPos - oldGridMin);
chunk.BBPosition = float3(chunk.spatialPos.x * chunkSize, 0, chunk.spatialPos.y * chunkSize) + (float)(chunkSize) * 0.5f;

chunks[id.x] = chunk;
chunksAppend.Append(chunk); // Mark as dirty for indirect dispatch
}
}

Pointers

Pointers represent future prototype positions. Each chunk has the same number of pointers and we calculate pointers positions based on chunk index.

Real scale chunks grid + chunk pointers ( not all pointers visible due to different prototypes density parms )

Here`s an example code showing a way to scatter pointers. _SquareGrid or _VoronoiGrid is different settings response for scattering pointers.

float2 min = chunk.BBPosition.xz-chunk.BBSize.xz*0.5f;
float2 max = chunk.BBPosition.xz+chunk.BBSize.xz*0.5f;

float2 pos = 0;
#if _SquareGrid
pos = S_GridPos(id.x,uint2(gridSize,gridSize),min,max);
#elif _VoronoiGrid
pos = VoronoiPos(id.x,min,max);
#endif

// Apply jitter
uint rng = S_PosToHash(pos);
float angle = S_Random(rng)*K_TWO_PI;
float r = S_Random(rng);
float2 offset = float2(sin(angle),cos(angle))*jitter*r;
pos += offset;

Random numbers

The generation of random numbers is crucial for the natural distribution of pointers. The S_PosToHash() function is primarily based on techniques from "Efficient Random Number Generation and Application Using CUDA" found in GPU Gems 3. Here is an implementation example of a hashing function:

#define TAUS_STEP_1         ((z1 & 4294967294U) << 12) ^ (((z1 << 13) ^ z1) >> 19)
#define TAUS_STEP_2 ((z2 & 4294967288U) << 4) ^ (((z2 << 2) ^ z2) >> 25)
#define HYBRID_TAUS_N2 ((z1 ^ z2) & 268435455)

uint S_Hash(uint s)
{
s ^= 2747636419u;
s *= 2654435769u;
s ^= s >> 16;
s *= 2654435769u;
s ^= s >> 16;
s *= 2654435769u;
return s;
}
float S_Random(in out uint rngState) // 0-1
{
rngState = S_Hash(rngState);
return float(rngState) / 4294967295.0; // 2^32-1
}
uint S_PosToHash(float2 pos)
{
uint z1 = asuint(pos.x);
uint z2 = asuint(pos.y);
z1 = TAUS_STEP_1;
z2 = TAUS_STEP_2;

uint x = HYBRID_TAUS_N2;
return x;
}

Picking Prototype

Selecting the appropriate prototype for each pointer is a complex task, especially when using compute shaders. The idea is to generate a random number and map it to a list of prototypes. This involves creating a buffer to account for prototypes with varying weights. For each pointer, a random number is chosen and mapped to an index in this buffer. The larger the buffer, the more precise the mapping.

Mapping prototypes

For picking more space aware random number in most cases I relly on blue-noise which i scale to approximately match foot-print size.
In terms of blue noise highly recommend this article: http://momentsingraphics.de/BlueNoise.html

Left : UniformRandom Right: BlueNoise

Avoiding clumps of specific prototype is not always the case and sometimes uniform random or other noise function might look better.

Example of random weight offset
Assign voronoi noise to weight map and adjust random weight offset

Many ecotypes in 1 FootPrint

When I assign 2 or more ecotypes to 1 footprint I simply offset targeting ecotype ranges.

Example showing how 2 ecotypes distribute over pointers random numbers

Graph obove show one of the possibilities to mix ecotypes. Another is just simple overwrite previous ecotypes based on new density. ( I use this the most )

Density

The density of prototypes is primarily determined by the prototype picking algorithm. By utilizing masks or noise maps, we can control how densely prototypes are placed in different areas. This allows for the dynamic adjustment of vegetation or other objects based on terrain features or player actions.

Density based on mask
Grass Density Based on Terrain Material

Overlaping instances

In the context of procedural placement, managing overlapping instances is a nuanced challenge. This issue primarily arises when multiple footprints are used in tandem, each with different response distances and prototype densities. For example:

  • Footprint A: Designed with a 0.5 m response distance, primarily for scattering grass.
  • Footprint B: Has a 4.0 m response distance, used for scattering trees.

In such scenarios, grass from Footprint A might occasionally spawn in the same positions designated for trees in Footprint B. While this could be seen as an overlap, it often contributes to the natural and chaotic feel of an environment. Nature itself is rarely uniform, and such overlaps can inadvertently add to the realism of the scene.

However, for cases where precise control over prototype placement is necessary, we can address this issue with the implementation of precision density masks. These masks allow for more exact control over where specific prototypes can be placed, effectively reducing the likelihood of unwanted overlaps. By using these masks, we can ensure that grass doesn’t grow where trees are placed, or vice versa, thus maintaining the intended aesthetic and functional qualities of each area.

Transform Prototype

Once a prototype is selected, it undergoes a transformation process. This includes aligning to slopes, random rotation, and scaling, among other factors. This transformation pass is executed only when chunks move and for pointers needing updates.

Diffrent foot-print placement methods. From Left: UniformSquare, VoronoiMatch 4m, VoronoiMatch 7m
Different rotations parms

Example Application / Use cases

  1. Dynamic Terrain Generation: Use in games that require endlessly generating terrains, such as survival or exploration genres. Terrain elements like forests, rivers, and mountains can dynamically adjust as the player explores.
  2. Biome-Specific Features: Different biomes can have unique ecotypes, ensuring varied and realistic environments (e.g., dense forests, arid deserts, snowy mountains).
  3. Real-Time Environment Alteration: Ideal for games where player actions or events change the landscape. This case is something i need a lot in a game I`m currently working on. Because we can play in 3-rd person and at the same time build large-scale castle. We need a method to clear vegetation in castle positions in an efficient way.
  4. Growth Over Time: Vegetation can regrow in abandoned areas, adding a layer of realism to world dynamics.

--

--