Optimization

Introduction to Data-Oriented Design

A shift from objects to data

Tamás Losonczi
Wunderman ThompsonBudapest

--

Object-oriented programming is considered to be standard in the software industry. And although there are several benefits to this approach, it’s not a silver bullet.

  • Code written in classical OOP style forces developers to use patterns where the focus is on abstractions and on the encapsulation of data. Though these concepts have their place in programming, they can be a major obstacle when it comes to writing efficient and optimizable code.
  • The second major drawback is that it introduces extra machinery that need not exist as a part of the mental model you create. As a result, you have to solve not just the original problem but all these additional mental puzzles.

Data-oriented

  • Data-oriented means the focus is on the data layout and transformations of data. It is an approach in which designing your software is based on the real needs of your program, not on abstract world modeling.
  • In contrast to OOP, hardware plays a central role and the solution leverages the characteristics of it.

Case study: matrix traversal

To acknowledge the legitimacy of this approach, let’s say we want to determine the sum of non-diagonal elements in a square matrix. Basically, we have two ways to traverse the matrices: row-wise, and column-wise.

The C# code for a row-wise traversal comprises an outer loop that increments the row index and a nested loop that increments the column index. The sum is stored in a variable.

The formula for the column-wise traversal is almost identical, only the two loops are interchanged this time.

The horizontal axis represents the dimension of the matrix. The vertical axis represents the execution time of the traversals.

Regardless of these similarities, we see a significant contrast in their performance. On this particular machine, row-wise traversal generally performed better, and it was 6x-7x times faster for larger dimensions.
Before explaining the reason, let me cover a few related concepts.

CPU cache

CPU caches are a small amount of very fast memories. They hold the content of recently accessed memory locations. Modern hardware uses multi-level caches (L1, L2, L3, etc.). The smaller the cache, the faster the access.

Cache hit

When the CPU needs some data, it looks for it in the closest memory location, in the primary cache (L1). If the requested data is found, it can be processed instantly. We call this a cache hit.

Cache miss

If the requested data is not in the primary cache, then the search continues in L2 then L3 and finally in DRAM (resulting in additional delays). A failed attempt to read/write is called a cache miss. Retrieving data from the main memory is significantly slower than retrieving from the cache (a fetch from L1 takes 1–2 cycles while a main memory fetch can take up 300 cycles).

A process flow featuring an L1 cache miss and an L2 cache hit.

Cache line

The main memory is read/written in terms of cache lines. This means when you read data from a memory location. You’ll also get the content near that address. This is a tool that we can use to maximize cache usage.

Cache line: a tool provided by the hardware

Why is row-traversal faster?

The reason row-wise traversal outperforms column-wise is that our matrix is represented as a single array, a contiguous chunk of memory. The code processes homogeneous data sequentially, making the most use of the cache line and the caches. Additionally, the CPU can easily detect the access pattern and perform prefetching.

Row-wise traversal memory access pattern

Meanwhile, the column-wise traversal jumps in the memory, resulting in cache misses and CPU stalling.

Column-wise traversal memory access pattern

Hardware doesn’t like objects

Unfortunately, all these characteristics of modern hardware make objects bad candidates for performance-critical codes.

  • Objects by their nature encapsulate data, this means if we are only interested in only one particular aspect of it (let’s say we only want to iterate over and change the “age” member of an array of Person objects), we waste the rest of the cache line.
  • Heap allocated objects are spread over the memory. Iterating over a collection of objects hits the memory at random locations and this is terrible for performance. In fact, the least cache-friendly thing you can do is to iterate over a linked list of objects. Because linked lists, unlike plain arrays, use references/pointers. And chasing pointers guarantees cache misses.
  • For the same reason, any kind of indirectness can have a bad impact on performance, including dynamic dispatching (virtual method tables are like pointers to pointers).

Data-orientation to the rescue

The idea behind DOD (Data-oriented design) is to write the code in a way the CPU can execute it with maximum efficiency, or at least write a code which is optimizable by default. DOD forces you to think in simple layers of abstractions and removing abstractions that do not help with the problem. This mindset can be very helpful when you are designing any piece of software.
To be more concrete here are some guidelines I collected:

  • Use simple data structures like structs with basic types the CPU can work with (floats, integers, etc.).
  • Organize your data so that data that’s accessed frequently can fit inside the caches. The smaller the frequently accessed data is the more cache hits you can get.
  • Try to arrange your data so things that will be used at the same time are close to each other.
  • Use linear, simple access patterns to take advantage of prefetching.
  • Use flat hierarchies and minimize indirectness (virtual function calls, pointers, etc.).
  • Utilize multiple cores by parallelizing work, though you can still encounter performance issues if you’re not careful (ex.: false sharing).

There are also techniques and patterns in DOD, for example:

  • Using an entity-component system to structure your “objects”. This architecture is like an in-memory database, where entities are identifiers that we can use to look up components. A component represents a particular aspect of an entity (only data, not behavior). A system queries components and performs operations on them.
    For example, let’s say you have 3D entities in your project. These entities can move in the virtual world. There are corresponding components that store data like position, heading, and speed. And there is a system that drives the related calculations.
  • Transforming your code from the array of structs (AoS) to the struct of arrays (SoA) form. This is a manual process where the more intuitive array of structs arrangement is laid out in several arrays per field, resulting in a much more efficient data layout for feeding SIMD units. This shift is appropriate when you have to perform the same operation many times, for example adding some value to all points within a 3D object.
AoS vs. SoA layout

Adaptation

It may be problematic to apply DOD principles in domains in which developers have limited control over the memory. There may be some ways to approximate it, but it requires manual control or a dedicated tool to get the maximal benefits out of it. Also, it’s not that straightforward to apply this mindset where things are event-driven by nature.

Thankfully, Unity realized that this is something worth investing in and created a set of powerful utilities called DOTS (Data-oriented tech stack) which makes the adaptation as easy as possible. DOTS comprises 3 components.

  • The Burst compiler, which produces a highly optimized machine code from a subset of C# (called high-performance C#).
  • The entity-component system, which helps with transitioning from the default object-oriented to the data-oriented approach.
  • And job systems, which make parallelizing your code less troublesome.

Closing words

Both object-oriented and data-oriented principles/techniques are tools that can be misused or used properly depending on the needs of the work you do. However, it’s hard to argue that over the last several decades the main emphasis has been on the “mental” part of programming. Still, it’s the hardware that makes the thing happen and it’s not driven by abstractions but data.

--

--