Thou Shalt Not Use Struct

Emmanuel Stephan
7 min readApr 14, 2016

--

Tales of Software Engineering — 2

Puzzle by George Hodan

Once upon a time, a Software Engineer had to refactor some code base and came across the following struct (in C++):

struct CompositeInfo 
{
size_t id;
CompositeType type;
Datum principal_datum;
Datum sub_datum;
};

He saw at once that this struct was dangerous. Being of a methodical nature, he wanted to state an invariant about the relationship between the principal_datum and the sub_datum. For example, there was the invariant that sub_datum could never be equal to the principal_datum, in general. Also, in an algorithm the Engineer was working on, some values in the range of the type Datum, such as Datum.Null, could never be assigned to principal_datum.

The Engineer wondered if he could really assert these invariants. He wanted to check if the invariants were really true, and so he came up with the following code:

struct CompositeInfo 
{
CompositeInfo(size_t _id, CompositeType _type,
const Datum& _sub_datum,
const Datum _principal_datum)
: id(_id), type(_type),
principal_datum(_principal_datum),
sub_datum(_sub_datum)
{
assert(principal_datum != Datum::Null);
assert(principal_datum != sub_datum);

}
size_t id;
CompositeType type;
Datum principal_datum;
Datum sub_datum;
};

This also made the code shorter, because a constructor could now be called, instead of spending 4 lines to set the 4 fields of CompositeInfo over and over again. The Engineer chased down all the places where a CompositeInfo was created, and called his constructor in each, hoping that the 2 assertions would survive the unit tests. Or not. At least, he would know if his new algorithm could rely on those 2 invariants.

It turned out, of course, that the invariants did not hold everywhere in the code base. There was a place where a smart developer had used a CompositeInfo as a sentinel in an array, and had precisely relied upon principal_datum having the value Datum::Null in the sentinel. And there was another algorithm in the code base that implicitly relied upon principal_datum never being equal to Datum::Null! That algorithm never used the array with the sentinel, so the conflicting assumptions on the possible values of principal_datum (undocumented, implicit assumptions) had never been a problem before. The Engineer’s heart sank. He was faced with multiple Software Engineering no-no’s which he catalogued as follows: reliance on implicit and undocumented assumptions, missing invariant checks, and finally: failure to encapsulate.

The Engineer also realized something else. There was a way to actually compute the sub_datum from the principal_datum and vice-versa, at very little runtime cost. Currently, the code to compute one field in term of the other was repeated (cut-and-paste) in several locations in the code base. He wanted very much to avoid the duplication, and reduce the memory footprint of CompositeInfo, and for that, he would need some sort of accessor that would do the computation appropriately, rather than letting clients directly access the members of the struct. He came up with this solution:

class CompositeInfo 
{
public:
CompositeInfo(size_t _id, CompositeType _type,
const Datum& _principal_datum)
: id(_id), type(_type),
principal_datum(_principal_datum),
{
assert(principal_datum != Datum::Null);
}
size_t id() const { return id; }
CompositeType type() const { return type; }
Datum principal_datum() const { return principal_datum; }
Datum sub_datum() const { return ... f(principal_datum) ... }
private:
const size_t id;
const CompositeType type;
const Datum principal_datum;
};

It was a long fight to refactor the sentinel value in the array so the invariant on principal_datum could be asserted, but eventually it was removed. All the clients of CompositeInfo went through the constructor, and therefore the assert. There were no implicit assumptions anymore, all the clients of CompositeInfo were in sync again. And a good chunk of memory was saved too, because there were many instances of CompositeInfo…

What lessons did the Engineer take away from this painful refactoring exercise?

Implicit assumptions kill. Software Engineers should make a habit of asking themselves what assumptions they are making in their code, to challenge these assumptions, remove as many as possible, and make whatever ones are inevitable blatantly and glaringly explicit, via invariant checks and documentation. It takes discipline and work to systematically eliminate implicit assumptions, but it might well be the single practice that will improve your software the most, regardless of which language or paradigm you use. One can actually wonder whether most if not all bugs in software can be traced back to some implicit assumption(s).

Use encapsulation as much as possible. Now, the provocative title of this post obviously goes too far, and there are some cases where using private sections for small data types might be overkill (e.g. std::complex, std::pair). There is no reason to write page after page of getter/setter pairs when it is nothing but boilerplate. Although, there are very good Engineers that would auto-generate those pairs, on the argument that the implementation could change in the future.

In general, encapsulation provides a way to structure the code and a unit of reasoning about the code. It sets a perimeter around a portion of the state of the program, inside which invariant properties can be guaranteed. It hides details — it is very much like having a Lego brick that performs some function and can be relied upon. You don’t need to keep all the details of all the code in your mind, you can rely on CompositeInfo doing the job of “CompositeInfo” correctly and move on to bigger and better things. In terms of structure, encapsulation helps decouple components of the program. In the struct example above, there was a “negative coupling” between the algorithm that relied upon principal_datum != Datum::Null and the array which relied upon principal_datum == Datum::Null.

Encapsulation also provides a unit of testing. Classes with private state can be validated in separate unit tests, so that the corresponding portion of the program is correct before it is being used. If a struct is used, there is actually nothing to test (everything and anything is possible as far as the fields of the struct are concerned!), which means that the clients of a struct have no guarantees and can do anything they want to the fields of the struct (and even contradict themselves between different places of the code, as was the case initially with CompositeInfo). Encapsulation with systematic invariants checking also greatly helps debugging. Each class with private state and invariants establishes a fence of checks through which bugs cannot propagate. You will pay a high price to validate the correctness of a program that uses structs rather than classes with private state (and invariant checks).

Encapsulation provides a unit of re-use. With properly encapsulated classes, software building can turn into assembling well-understood and well-tested parts, which translates into greater velocity — not only because assembling parts is faster than rewriting them each time, but also because the parts have been independently validated, so that testing costs are reduced (and shifted to assembly testing, rather than part testing).

Encapsulation enables division of labor. Once the interface of a class has been clearly defined, there can be an owner for the inside of the class, and clients that just know the interface of the class. Of course, the trick is to spell out the assumptions of that interface very explicitly. But once that is done, the owner and the clients can go about their respective business in parallel, which is again a velocity booster for the overall project.

Encapsulation is a unit of refactoring: as long as you provide the same interface, you can safely change the inside of the perimeter, without disturbing the clients, and vice-versa (that is, if you’ve made all the properties and assumptions explicit). On the other hand, using a struct as first presented in this post makes the code extremely expensive to refactor: you have to track down all the usages of the struct first, validate which assumptions they each make on that struct… by the time you are just starting to understand what is going on, the Engineer with a proper class (and explicit assumptions) would be done refactoring.

A very good practice is to make the state in your class immutable (that is not always possible of course), and to spell out the invariants that hold on that immutable state in a single constructor. You can also separate the invariants in a method called e.g. “checkInvariants” that is called from the body of the constructor, to spell out and document that these are the invariants that hold for any instance of that class. Once the invariants are verified, one time, in the single constructor, they will automatically be true for the lifetime of the object, since the values of the fields are immutable. Immutability and invariants thus play well together and greatly enhance the correctness of programs. This is a very short step away from endorsing functional programming forever, but unfortunately, it looks like we still live in a world where manipulating state is the way to go for performance…

Again, there are cases where a simple struct will do (e.g. std::complex — there are no invariants on the real and imaginary parts of a complex number), but as soon as there are dependencies between the members of a struct, or if you want to make sure invariants hold and that your clients will not start making their own assumptions on the members of a struct, you are better off with a real class which has a public and a private section. Failure to encapsulate will lead to very high costs to understand the code, ensure its correctness, evolve and maintain it. All these are surefire ways to at least kill the velocity of your software project.

--

--