C/C++ Self-Referential (Recursive) Unions

Alessandro Maclaine
4 min readSep 10, 2019

…What?

Although I’ve spent a fair bit of time in typed languages like C/C++ and Java, I still have a much stronger familiarity with dynamic languages like JavaScript. In JavaScript we readily take for granted our ability to ignore types and the power and flexibility Objects and unrestricted Object composition provides us.

Thankfully it is possible to bring a portion of this flexibility to statically typed languages like C or C++… but not without jumping through some hoops. In this article we’ll explore a fundamental construct of C and C++ that allows the creation of data structures that contain arbitrary composition and varied types: Self-Referential Unions.

Dynamically Typed: JavaScript

Consider this basic LinkedList implementation in JavaScript:

class LinkedList {
constructor() {
this.root = null
}
AddNode(value) {
const newNode = this.NewNode(value)
newNode.next = this.root
this.root = newNode
}
NewNode(value) {
return {
value,
next: null
}
}
}
const newLL = new LinkedList()
newLL.AddNode("123")
newLL.AddNode(123)
newLL.AddNode({val: 2})
console.log(newLL.root)

The output of the console log is:

//{
// value: { val: 2 },
// next: { value: 123, next: { value: '123', next: null } }
//}

Our LinkedList Nodes maintain a consistent structure, we have a value parameter holding the nodes value, and we have a reference to the next Node in our list, which also holds a value and a reference.

Take note that the types of value changes from Node to Node. In the head Node it is an Object, in the next Node it is a Number and in the tail Node it is a String.

In a dynamically typed language this is easy to achieve, we just add a new Node with whatever value type we want. Done.

Statically Typed: C/C++

Can we mimic this same functionality using statically typed languages? Fortunately the answer is yes, in fact there’s multiple ways to achieve this, but I’ve chosen self-referencing unions as the basis for my solution.

What is a Union?

A union is a special class type that can hold only one of it’s non-static data members at a time. The union is only as big as necessary to hold its largest data member. The other data members are allocated in the same bytes as part of that largest member.

For Example:

#include <iostream>union test {
char ch;
int nt;
};
int main() {
test t;

t.ch = 'a';
std::cout << t.ch << std::endl; // a

t.nt = 113;
std::cout << t.nt << std::endl; // 113
}

Note: Accessing the unset property in C++ is undefined behavior. In C you would get the value of the set property casted to the type of the unset property, known as type punning.

Self-Referential (Recursive) Unions

The conceptual idea behind recursive unions is they provide a means of keeping track of a single value, either directly, or through a chain of connections. Compare this to a LinkedList which provides us with a chain of values.

In a LinkedList we follow a chain of connections and get a value at each step, in a recursive union we follow a chain of connections to a single value. At each step in a recursive union, we either have the Node with value we seek or a connection to another Node that can provide the value or continue routing us to our final destination.

#include <iostream>union test {
test * next;
int val;
};
int main() {
test test1;
test test2;
test test3;
test test4;
test test5;
test5.val = 2;
test4.next = &test5;
test3.next = &test4;
test2.next = &test3;
test1.next = &test2;
std::cout << test1.next->next->next->next->val << std::endl; // 2
}

Notice how our union has a pointer to itself, we have to use a pointer otherwise the union would recursively allocate itself on initialization, requiring infinite memory. If you compile and run this, you’ll see we successfully follow the chain of unions, eventually finding the end value.

But there’s a problem. Recall, accessing the unset property in C++ is undefined behavior. We had to know the exact number of times to access next before accessing val. This isn’t very useful for real-world examples where we have arbitrary length chains, like we usually do in a LinkedList.

To use recursive unions practically, you’ll need to keep track of more information about the interconnections. For this reason recursive unions are usually controlled from larger constructs in C++ like classes or structs.

Conclusion

Dynamically typed languages like JavaScript provide us the flexibility to create arbitrary data structures out of composed data structures and arbitrary types. This flexibility is useful in solving a number of problems without having to laboriously structure your code to satisfy the type system.

Thankfully it is possible to bring this functionality to statically typed languages, albeit at the cost of simplicity. While there are a number of ways to solve this issue in C++, I’ve chosen recursive unions as the basis for my solution. However, because unions are mutually exclusive properties with undefined behavior for unset access, they alone are not enough. To handle this we must add additional control constructs around the union to ensure correct usage and access.

--

--