Low poly Android GPU Inspector analyzing low poly game

Using texture counters in the Android GPU Inspector

Francesco Carucci
Sep 14, 2020 · 4 min read

Android GPU Inspector (AGI) lets us peek into the inner GPU workings on Android. One of the most demanding GPU tasks is to fetch and filter texture data from within a shader. We can use AGI to monitor texture-related GPU workloads by capturing texture GPU Performance counters in three categories: bandwidth, cache behavior and filtering.

a higher-bandwidth version of much of this information

I always begin with looking at texture bandwidth, because it indicates how much texture data is being transferred to the GPU during a frame and can quickly highlight potential texturing performance issues.

When it comes to texture bandwidth, a good rule of thumb is making sure that the Texture Read Bandwidth is not much higher than 1GB/s on average and its peaks stay well under 5GB/s.

This game, for example, is consuming a lot of texture bandwidth, with an average of more than 4GB/s and peaks, towards the end of the frame, of more than 6GB/s.

It’s expected that Post Processing steps can be particularly heavy on texture bandwidth; you might be ok with spending a portion of your bandwidth budget towards the end of the frame for special effects like bloom and tone mapping. But if the color pass of your game has a high texture read bandwidth peak, you might have potential performance issues to investigate.

For this game, texture bandwidth consumption is very high and needs further investigation.

To investigate a potential texture bandwidth issue, I first look at texture cache behavior. My focus is on the percentage of texture stalls, L1 and L2 fetch misses. When texture data for a texture fetch is not found in the L1 cache, the request is forwarded to the L2 cache and then to system memory. Each step introduces more latency and consumes more power. Average L1 cache misses should be below 10% and peak below 50%.

The GPU system capture of this game shows an average percentage of L1 cache misses over 20% and peaks up to 80% or more.

These numbers are again very high.

Typical reasons for a high percentage of texture stalls are uncompressed textures, complex filtering like anisotropic filtering and textures not being mipmapped.

To investigate potential causes of texture cache misses I look at the percentage of anisotropic filtering texture fetches, which is very expensive on mobile, and at the percentage of Non Base Level texture fetches.

The percentage of Non Base Level texture fetches is an estimate of how efficiently texture mipmaps are being fetched. When this number is 0, it means that the GPU is always accessing the top level, the biggest slice, of the texture mipmap chain or that textures are not mipmapped at all.

This can be an issue on most 3D games, while it’s usually acceptable on 2D games.

Accessing not mipmapped textures is ok when rendering GUI or during post processing, but in any other scenarios it comes with a large performance penalty and is cause of poor cache behavior.

In fact, fetching textures consumes a lot of system bandwidth and can potentially introduce latency, increase battery life and cause thermal issues that will further degrade performance in the long run.

Analyzing GPU counters related to texturing behavior can help uncover potentially big low hanging fruit that can improve user experience substantially when fixed.

To find these kinds of GPU performance issues related to texturing, take a trace of your game using Android GPU Inspector and compare the values and trends of the GPU counters to the guidelines given here.