D: Structs Don’t Work For Domain Data
--
Let me preface that I love the D programming language and I think it’s largely a well-designed and productive piece of engineering. However, it has a pernicious pattern that seems to come up again and again in actual use.
You use some feature. It works great, and everybody who uses it is happy. Then at some point you try to make it interact with some other feature that D advertises, and suddenly everything explodes, all your unittests fail, your builds are red and the compiler starts to emit cryptic warnings possibly presaging the end times.
What happened? Like me, you’ve discovered the fact that D has a lot of features and some of them (like interfaces and contracts) simply do not work together and, in fact, explode. The problem is that because the feature was so great and fun to use, by the time it explodes you have already used that feature all throughout your codebase, in the mistaken (if reasonable) belief that it would work fine with the rest of the language.
In this article, I’ll highlight one such feature to make the case that you probably shouldn’t use structs to store domain data. My hope is that this article will reach you when you’re considering doing it and not, say, halfway through a rewrite of your codebase.
Everything starts out simple and happy. You want to be @nogc, possibly because you thought you could rely on the D garbage collector and now your processes are using 5GB and need to be restarted daily. So you start porting a few of your more common class types to small, compact value structs. It works fine, and you notice a marked reduction in null pointer crashes.
You spot something. A struct that you’ve just written has a field that stores a string. This string can never be null or empty in a valid dataset. Maybe it’s a username, or an address. Regardless, you would very much like to ensure that this field is only ever non-empty.
Trustingly and with an open heart, like Bambi’s mom walking curious and bright-eyed into an open clearing, you reach towards another D feature that you are familiar with from classes.
Invariants.
[Doom-doom dooooom!]
You just tried to combine two of D’s features. That was a mistake. The trap is laid.
Anyway, everything continues to go well for a while. You disable the parameterless struct constructor, ensuring that nobody can create a struct with an invalid username by accident. You define
struct MyDomainData {
string username; @disable this(); // don't make a MyDomainData() by accident! this(string username)
in(!username.empty) // only non-empty usernames please!
do { this.username = username; } // let's formalise the restriction.
invariant { assert(!username.empty); } string toString() { ... }
...
}
and happily go on only using only valid instances of MyDomainData. The number of bugs from invalid data goes down further. “What a great and well-designed language,” you think to yourself.
You try to format an instance of your domain data. Everything explodes.
You try to stick your struct in a Nullable. Everything explodes.
Your code is on fire and your world does not make sense anymore. What happened? Where did you go wrong?
The place where you went wrong is when you decided to put data in a struct that could not be null. In doing so, you have unknowingly violated a cardinal rule of D structs: T.init must always be valid.
To be fair, it’s not like this rule is documented anywhere. In fact, even worse- the documentation states it exactly the wrong way around!
What happened is the same thing that happened with the format error above. A lot of D standard library code assumes that if it takes the init value of a type, it can then call methods on that type and they will work as usual. The format code, for instance, tries to call format(T.init) at compiletime in order to check for errors in the format string. Similarly, when you use a function like move(ref source, ref target), its source parameter will be reset to T.init and then destructed, and of course the destructor checks the invariant and fails because T.init is not a valid value. Oh and just as an added gotcha, it’s completely impossible to bypass struct destructor calls.
So, what can be done?
Option 1. Don’t use structs to store data when T.init is not a valid instance of that data.
There’s a problem with that. I gathered some statistics on our codebase, and the three most common invariants, by a long shot, are “non-null” (for objects), “non-empty” (for arrays) and “non-zero” (for integers).
All three of those are violated in T.init. “Don’t use structs to store data when T.init is not valid in the domain” is equivalent to saying “Don’t use structs.”
Option 2. Don’t use invariants.
You know, there’s a pattern in D where it advertises some feature but it turns out you can’t actually use it because it doesn’t work! I think this is a copout. If the language claims to have Design by Contract, then I will damn well take it up on that, and lay the blame for any issues that arise from that squarely at D’s feet.
Plus, invariants are useful. The whole point of using them was to avoid bugs arising from invalid data, and now it seems like the language insists that actually, the type needs to tolerate invalid data after all. Then what exactly is the point?
Option 2b. bool isValid()
Oh wow, we have a function bool isValid(), and it checks for a bunch of conditions, and every function we write in { assert(isValid); } out { assert(isValid); } ; wow, this feature sure sounds familiar!
Sure would be great if it was integrated in the language or something.
Option 3. Use classes.
Where do we go from here?
I don’t know.
Some of this can be fixed. The format issue is a plain bug, though a symptomatic one, and there’s a cheap workaround for the destructor problem that involves disabling invariants for just that function. The deeper problem is that D doesn’t seem to have a clear concept of what structs are for. Are they for utility types like Nullable or RefCounted, with domain data relegated to classes? Are they for “plain old data”, like in C? Are they like classes, just without inheritance? This combines with the way that struct semantics have evolved over the entire lifetime of the language into a somewhat confusing mess (move vs opAssign vs postblit, opDot vs alias this vs opDispatch, and now this T.init confusion) that is neither well specified nor very coherent.
But this is not nearly the first time that this sort of “surprise” feature interaction has happened to me, and for the time being, as much as I love it, I don’t know if D is a language that I can recommend.