C/C++ Tagged/Discriminated Union

Alessandro Maclaine
3 min readSep 10, 2019

Organizing and updating union meta-data for programmatic use and access

This article independently builds on the topics presented here: https://medium.com/@almtechhub/c-c-self-referential-recursive-unions-22b334493eaa

Background

If your familiar with unions in C/C++ then you probably know that unions alone are not robust data structures, especially in C++ where accessing unset properties is undefined behavior. For this reason there’s a long standing concept in computer science known as tagged unions.

Simply put, tagged unions are unions that have associated with them a piece of data that tracks which of the potential union properties is currently set. In C++ you can choose between structs and classes to encapsulate the union, or develop your own custom data structure base solution (like a map). The eventual use case I have in mind is rather simple so I’m going to use structs.

Simple Example

#include <iostream>union vals {
char ch;
int nt;
};
struct tagUnion {
char tag;
vals val;
};
int main() {
tagUnion tu;
tu.tag = 'i';
tu.val.nt = 21;
for(int i = 0; i < 5; i++) {
if(tu.tag == 'i') {
std::cout << tu.val.nt << std::endl;
tu.tag = 'c';
tu.val.ch = 'A';
continue;
}
if(tu.tag == 'c') {
std::cout << tu.val.ch << std::endl;
tu.tag = 'i';
tu.val.ch = i;
continue;
}
}
}

In this example we have a simple union that holds a char and an int. This union is wrapped in a struct that has a char variable and an instance of the union. We use a single letter char to represent each type, either i or c. In main we initialize the tagged union and then loop. In each iteration we check which property is set, log the value, flip the tag and set the associated property. The output from running it is:

21
A
1
A
3

This is the expected output, we’ve successfully created a tagged union and used it in a simple programmatic manner. Now let’s revisit the Self-Referential Union example from the last article.

Self-Referential Tagged Union

If we apply the lessons of this article to the recursive union example of the last article, you’d end up with something like:

union test {
test * next;
int val;
};
enum tagType { link, value };struct taggedTest {
tagType typ;
test val;
};

However, if we use this set up, when we access taggestTest.val.next, we’re pointing at another union and won’t have our tag. Instead we need the union to reference the taggedTest struct, so we use a forward declaration. Here is the full example with a simple test:

#include <iostream>struct taggedTest;union test {
taggedTest * next;
int val;
};
enum tagType { link, value };struct taggedTest {
tagType typ;
test val;
};
int main() {
taggedTest test1;
taggedTest test2;
taggedTest test3;
taggedTest test4;
taggedTest test5;

test5.val.val = 2;
test5.typ = value;
test4.val.next = &test5;
test4.typ = link;
test3.val.next = &test4;
test3.typ = link;
test2.val.next = &test3;
test2.typ = link;
test1.val.next = &test2;
test1.typ = link;
taggedTest * tmpTT = &test1; int i = 1;
while(true) {
if(tmpTT->typ == link) {
tmpTT = tmpTT->val.next;
std::cout << "Crossing Link: " << i++ << std::endl;
} else if(tmpTT->typ == value) {
std::cout << tmpTT->val.val << std::endl;
break;
}
}
}

Conclusion

Unions are of dubious value without keeping track of which property is active. This is less of a concern in C which supports type punning, but in C++ accessing inactive properties is undefined behavior. The common solution to this issue is tagged/discriminated unions which in C++ is achieved using classes or structs. Since my eventual use case is simple, I decided to use structs.

Tagged unions are fairly simple to implement, but when combined with recursive unions there’s additional complications. When we need Self-Referencing Tagged Unions we have to forward declare our struct and use a pointer to the struct as the next type in the union.

--

--