Prioritizing Data-Oriented Design Paradigms in Your Code

A more mindful approach to software design

Sean Kennedy
Jun 12, 2020 · 6 min read
Image for post
Image for post
Photo by Safar Safarov on Unsplash

When software developers consider what good software design looks like when working with high-level programming languages, many minds instinctively wander toward the guiding principles of object-oriented programming (OOP).

We lean so heavily on this paradigm not only because most of us were taught this way, but also because an object-oriented approach — structuring code based on objects in our perceived model of the real world — is an intuitive and generally good way to design software when working with object-oriented programming languages. Encapsulation, abstraction, inheritance, and polymorphism are four very important principles in software design that work nicely within the object-oriented paradigm while providing high levels of code reusability and maintainability. The merits of object-oriented programming are numerous and well-documented.

While there is nothing inherently wrong with object-oriented design, developers can program themselves into a box because they have constrained themselves within the dimensions of an object-oriented world. Code that is modeled well from an object-oriented perspective can actually disguise the fact that the code is not solving the problem at hand in an optimal way, ignoring the purpose behind writing the code in the first place. Taking an approach centered around how the code is really working to transform program data — a data-oriented approach — can help programmers avoid the pitfalls that object-oriented code can conceal.

An Introduction to Data-Oriented Design (DOD)

When taking a strictly object-oriented approach to writing software, we tend to lose sight of the goals underlying our code.

Code is a step-by-step guide through which humans can explain to a particular computer system how to solve a given problem in an efficient way. Some problems are obviously more complicated than others, but almost all problems solved by software boil down to the manipulation of data from one form into another. Code is merely the tool that programmers use to solve this data transformation problem. To solve this problem in the most optimal way, programmers must have an understanding of how their code affects the hardware platform that they are targeting. This part is often ignored by programmers in favor of prioritizing an object-oriented paradigm.

A common example of this is how programmers write their code without consideration for how their program is utilizing the CPU cache. Caches allow a computer to take advantage of data locality to greatly increase memory-access efficiency; however, if the programmer does not write their program in consideration of data locality, they are missing out on this key performance boost.

Data-oriented design is the idea of writing code to create data structures optimized for efficient processing rather than writing code to structure data around real-world objects. This means that data and functionality are separated so that functions can act on data in a more general form. This makes it easier to optimize performance through cache utilization and parallelization.

For most applications, these performance advantages are relatively minor, but for problems where speed is of the essence, a data-oriented approach to problem solving is paramount. But what do these ideas look like in real code?

Data-Oriented Design: Example in C++

(The code I wrote for this example is available here on Github. Please note that it is not intended to be used for anything other than to showcase differences between software design paradigms)

This example looks at two different implementations of a simple stock portfolio management system in C++ . The first code snippet shows a class structure consistent with object-oriented design principles.

The key data member in this class is the stock portfolio which consists of a vector of stock position structures. Each stock position consists of a stock symbol, the current value of a share of that stock, the number of shares owned, and the average cost paid for each stock. Modeling a stock portfolio as a list of stock position models is a very intuitive way to structure this code. However, performing computations on this data structure requires looping through the stock portfolio and accessing the 56-byte memory chunk for each stock position, even when the computation only needs a small piece of information about each stock position.

Alternatively, a data-oriented design of a stock portfolio system might look something like this:

The key difference in this example is that rather than using an array of structures (AoS) like we had in the first example, this class uses a structure of arrays (SoA) to store all the data associated with the stock portfolio. This allows us to perform computations that iterate over arrays of smaller data members which can be more efficiently cached. This can significantly boost the speed of the application in which this class is used.

How does this affect the performance of computational methods?

An implementation of computational methods in the object-oriented version might look like this:

Meanwhile, an implementation of computational methods in the data-oriented version might look like this:

When the total return is computed on a stock portfolio consisting of 1500 stock positions, the data-oriented implementation outperforms the object-oriented implementation by a factor of 2 on average (GCC 7.5.0; Ubuntu 18.04 64-bit; g++ -O2). Even for such a small piece of code, the change in the structure of the code resulted in a significantly more efficient system because the hardware had more direct access to the data. Based on this example, it is easy to see how object-oriented code written without regard for the underlying hardware can result in an inferior system, even when the code structure seems sensible.

Yes, DOD and OOP can (and should) coexist

This is because data-oriented design and object-oriented programming are not mutually exclusive. Data-oriented design can be approached as a programming philosophy rather than a specific design framework for code. It is characterized by programming with a primary focus on solving the data transformation problem before even considering code design. Once the programmer has decided on how the hardware should interact with the data, the code can then be designed around this data.

From here, the programmer can begin to analyze the trade-offs between different object-oriented models. Often, it is reasonable (and preferable) to sacrifice imperceptible performance differences for a more maintainable and/or reusable codebase, but the programmer can only understand the value of this trade-off if they understand how it affects the relationship between the hardware and the data. This can only be done if the programmer takes a data-oriented approach to writing their code.

Conclusion

When writing code with high-level programming languages, it is easy to get lost in the convenient levels of abstraction afforded to us, and this leads us to forget that our code is responsible for telling hardware how to transform data in a very specific way at a lower level. A data-oriented philosophy leads to code that is more fundamentally sound, and it forces developers to truly understand what their programming language is doing under the hood. Being conscientious of data-oriented design may not radically change the way you write code, but having a better understanding of why it is important will almost certainly make you a better developer.

Thanks for reading! I also want to give a shout-out to everyone in my Advanced Topics in C++ class for reviewing the first draft of this article. Your feedback was super helpful in the creation of the final version.

If you want to learn more about data-oriented design, I highly recommend checking out some of the resources below.

Additional Resources

The Startup

Medium's largest active publication, followed by +755K people. Follow to join our community.

Sean Kennedy

Written by

CS Master’s Student @ Michigan State University | President @ Spartan Blockchain Solutions | interested in tech, code, finance, startups, math, and other stuff

The Startup

Medium's largest active publication, followed by +755K people. Follow to join our community.

Sean Kennedy

Written by

CS Master’s Student @ Michigan State University | President @ Spartan Blockchain Solutions | interested in tech, code, finance, startups, math, and other stuff

The Startup

Medium's largest active publication, followed by +755K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store