Simple but not easy software development

Johannes Foufas
Volvo Cars Engineering
5 min readJun 17, 2022

This post is a tribute to, in my view, one of the best YouTube clips about software development, Mike Acton’s talk: “CppCon 2014: Mike Acton “Data-Oriented Design and C++”. There are so many aspects of his talk that me and my colleagues resonate with.

In order to solve new problems, we have to make space for them…performance is king. [06:33].

Well, the same goes for an embedded system, and especially for an autonomous core computer. Here we have a lot of high-resolution sensors like radars and cameras, and developers need to optimize the code for throughput, latency, correctness and determinism. So, we need to write fast, memory-efficient low-level code to capture high-frequency, high-volume data from our sensors, and to share it with multiple consumer processes — without impacting central memory access latency or starving critical functional code from CPU cycles.

the future sensor setup

From the lowest levels of the stack, the software needs to be tightly integrated with the hardware. Mike points out that this is one of the three lies of the Software industry:

1. Software is a platform

2. Code designed around a model of the world

3. Code is more important than data

As Mike points out about the first lie, obviously hardware is the platform, and with different hardware we will have different solutions to our problem. We can never forget this, abstractions too far away from the hardware will not work for performance critical features.
Mike also states:

If you don’t understand the cost of solving the problem, you don’t understand the problem and if you don’t understand the hardware, you can’t reason about the cost of solving the problem.

One day in the shower, I tried to figure out a key performance index, that would tell the cost of communicating in the stack versus performing, what I call, the end customer related functions. I called this middleware overhead. Of course, this metric works best when one compares the same or a similar code base, and it will also look lower and better if one implements some inappropriate CPU load consuming code in the module. Automotive engineers might react on the Autosar case, and that is in my view a good example of a standard that conforms heavily to the three lies. Adding software layers and abstracting away the hardware has a cost. Just to mention, the optimized hack had just three layers, the code, a hardware abstraction, and an Ethernet sniffer and dumper.

Middleware overhead, either the time (or the CPU) load it takes to communicate with the world compared to execution of the code that does the magic

One of the most striking sentences in the video clip is:

Solving problems you probably don’t have creates more problems you definitely do.

This, I would argue, is one of the most dangerous problems. Sometimes software architects get carried away by a beautiful vision, that does not solve any real engineering problem. The real engineering problem comes from analyzing data and maybe with a task such as avoiding a collision, and if the problem is imaginary, it goes out of the window. Mike makes the rather fun remark:

Software does not run in a fairy aether powered by the fevered dreams of CS PhDs.

I think one aspect that I have seen, that relates to this, is the amounts of levels of abstraction. Ok, I am not a complete fundamentalist, we could have use of one or two layers on certain platforms, but maybe not twelve. That will cost CPU cycles, and we need to reason about what the cost of all these are. When it comes to large scale sensor data, there are not many options, we need to make use of shared memory mechanisms, and then we might get away with some layers.

Anyway, sometimes when we analyze the data and discuss the solution, the simplest solution will just not solve the problem well enough. Consider a control theory problem, where we want to control one of the most critical aspects of our product. In the past, a simple PID control algorithm might be the only thing what we could fit in our micro controller, but today, there is a wide range of control systems, that would fit an embedded micro controller or a larger core computer. So maybe we actually need a physical model of parts of our product and maybe we need to invest in something like a Model Based Prediction control algorithm where we control not the sensor data but customer related cost functions, if the performance improvement may justify the CPU load and memory consumption. To make this call, we need to understand our data and our problem, and we need to weigh in aspects like robustness. If we are over engineering a problem, like controlling a simple aspect of our product with a large-scale neural network, that consumes a major part of our modules CPU load, then we might be doing it wrong.

In the end of Mike Acton’s video, he shows some nice simple examples of how to reason about data and its performance. In one of the examples (49:38) he looks at the information density over time of a is_spawn value by printing it out and then zipping it. Quick, dirty and efficient, but in the simple calculation there is still a quite accurate assumption of the percentage of waste.

On a bit similar tangent some of my friends use the free tool https://godbolt.org/ where one can choose compiler and target, input C/C++, Go, Rust and D code and check the assembler output. This way, one can analyze snippets of code and compare a rough estimate of instructions, and assess what is a good and what is a bad idea.

On a higher level, I thought of mentioning other ways we look at our code. The most basic one, is to instrument a type of time consumption for all the major calls, that is lightweight in a sense that it does not add a lot of overhead calculation time. Looking at these and comparing them to the actual value of adding code execution time will give us a rough hint of where we are heading.

Then of course, for more brutal instrumentation, I do like the open source tool http://kcachegrind.sourceforge.net/html/Home.html. Although some people argue that it’s not as accurate as pure timing measurements, I think it is accurate enough for its purpose, to let us know of most of the task performances of our module. It does add a lot of overhead, so it’s maybe not for evaluating real production performance.

The user interface is quite nice too, it is intuitive to look at the Callee map view and the call graph. At times, we have used this tool both on target hardware and on servers running x86 images in the cloud. The cloud jobs run on different hardware, but still give a quick and dirty relative assumption of what that might have gone wrong. Comparing x86 runs with appropriate target runs is interesting, because although the processes to some extents are different, the major problems are as far as I have seen visible in both.

Calle map

--

--

Johannes Foufas
Volvo Cars Engineering

Sr Principal Engineer Sw, drives Zuul CI at Volvo Cars Corporation