Code generation for C++

Emanuele Dalla Longa
10 min readApr 23, 2023

Let’s talk about expressivity.

How to reduce code size, while keeping it safe, flexible and readable in strongly typed language like C++? One way is to use features to expand short, idiomatic code into more boilerplate.

I’ll explore some of these in this article, and hopefully, by the end of it, you’ll find out whether you might want to specify a Domain Specific Language for your framework’s types, and learn a possible way to do it.

The repository with the code is linked at the conclusion.

Photo by Fahrul Razi on Unsplash

People need to write variations of the same code all the time. I’ve seen this turn into degeneracy in many different ways.

In increasing order of flexibility (and decreased chances of abuse), people create variations by:

  • Copy pasting code
  • (Ab)using macros
  • (Ab)using templates: this one is particular, since many don’t even realize templates actually generate code
  • Defining a Domain Specific Language in JSON, Python and Jinja2: I’ve seen variations of this using XML, but in this article I would like to showcase the power of Jinja2 for this kind of problems

The common denominator between these features is that, under the hood, they just take a text input and generate a C++ output that is then fed into the compiler.

Let’s look at them in order.

Copy pasting code

This one will make many developers roll their eyes, or trigger their PTSD, as it’s a development practice that, if not strongly discouraged, is behind a lot of spaghetti code. There’s a reason why copy pasting code is generally considered bad practice.

Always feel a tinge of guilt when you press CTRL-C! I’ve always found that, when I thought I had found an exception that seemed to justify copy pasting code in two different places, there was an underlying issue of bad design.

There is no reason to keep the same code in two different places. If you need to perform the same procedure twice, move it out and make it a separate function. It will save time even in the short term, even if the short term is seconds or minutes. If it’s two different variations, think… how do the two pieces differ? If the code does the same thing except for a runtime parameter, make it a parametric function. If it needs different types, template it. If templates don’t work… see the last section.

Using macros

In my experience, I have found only one instance where you might need a macro in modern C++. When you need to stringify a type name, to use it at runtime:

#include <string>
#include <vector>
#include <iostream>


template <typename T>
class SomeTemplatedClass
{
public:
SomeTemplatedClass( std::string _name,
std::vector<T> _vals)
: m_values(_vals),
m_name(_name)
{}

void print()
{
std::cout
<< "I am an object with a payload of type "
<< m_name
<< std::endl;
for (auto v : m_values)
{
std::cout << v << " ";
}
std::cout << std::endl;
}

private:
/* Assume these are inaccessible and outside of your control */
const std::string m_name;
const std::vector<T> m_values;
};

#define MAKE_TEMPLATED(type,vals) SomeTemplatedClass<type>(#type,vals)

int main(int argc, char* argv[])
{
std::vector<int> int_payload({1,2,3});
auto obj_int = MAKE_TEMPLATED(int,int_payload);
obj_int.print();
std::vector<float> float_payload({1.0f,2.0f,3.0f});
auto obj_float = MAKE_TEMPLATED(float,float_payload);
obj_float.print();
return 0;
}

…do you really need that macro to save half a dozen characters in the function body?

You might want to do more than one thing with the type name, something like this (I have seen this countless times):

#include <set>

/* Same class definition as above */

/* Now keep track of what has been generated…
* and cause someone a headache in a few months / years */
#define MAKE_TEMPLATED(type,vals) SomeTemplatedClass<type>(#type,vals); type_set.insert(#type)

int main(int argc, char* argv[])
{
std::set<std::string> type_set;
std::vector<int> int_payload({1,2,3});
auto obj_int = MAKE_TEMPLATED(int,int_payload);
obj_int.print();
std::vector<float> float_payload({1.0f,2.0f,3.0f});
auto obj_float = MAKE_TEMPLATED(float,float_payload);
obj_float.print();

/* Print all types that have been generated */
std::cout << "Generated types:" << std::endl;
for (auto t : type_set)
{
std::cout << t << std::endl;
}
return 0;
}

Don’t you hate it? Why not make a separate function, if you have to? It reduces the amount of code actually generated, thus, increases the amount of code you control and can extend easily.


#include <set>
/* Same class definition as above */

template<typename T, typename... Args>
SomeTemplatedClass<T> make_registered(std::set<std::string>& type_set, std::string type, Args... args)
{
auto obj = SomeTemplatedClass<T>(type,std::forward<Args>(args)...);
type_set.insert(type);
return obj;
}

#define MAKE_TEMPLATED(type,vals,set) make_registered<type>(set,#type,vals)

int main(int argc, char* argv[])
{
std::set<std::string> type_set;
std::vector<int> int_payload({1,2,3});
auto obj_int = MAKE_TEMPLATED(int,int_payload,type_set);
obj_int.print();
std::vector<float> float_payload({1.0f,2.0f,3.0f});
auto obj_float = MAKE_TEMPLATED(float,float_payload,type_set);
obj_float.print();

/* Print all types that have been generated */
std::cout << "Generated types:" << std::endl;
for (auto t : type_set)
{
std::cout << t << std::endl;
}
return 0;
}

Passable, but still not a fan. Macros are hard to debug and take control away from you, since the preprocessor is doing stuff that is not immediate for you to verify.

I’ve never seen an instance of usage of large macros where there isn’t at least one case in which the macro doesn’t work and has to be expanded manually.

Maybe there is a better way to enable great expressivity with few keystrokes… follow till the last section!

Templates

Some people seem to think templates are some form of inheritance. They’re not, they’re a way to generate code from, well, a template. All generated classes are unrelated and share absolutely nothing at runtime.

Consider this:

#include <iostream>

template<typename T>
struct Message
{
const T payload;

void print()
{
std::cout << payload << std::endl;
}
};

int main(int argc, char* argv[])
{
auto msg_int = Message<int>{420};
msg_int.print();

auto msg_float = Message<float>{420.69f};
msg_float.print();
}

All good and well, right?

But what if this is used throughout your codebase, the class has grown more complicated, and at some point you want to do something like this:

/* Message class same as above */

void process_message(/* What goes here…? */ msg)
{
msg.print();
}

int main(int argc, char* argv[])
{
auto msg_int = Message<int>{420};
auto msg_float = Message<float>{420.69f};

std::vector< /* What goes here…? */ > vec = {msg_int, msg_float};
for (auto& msg : vec)
{
process_message(msg);
}
}

If you instantiate a gazillion of templated objects with different template arguments that should be possible to use interchangeably, then you’re generating a gazillion of different class types, and the moment you need to do something with some of several of them (like put them in a container, or iterate over them)… you can’t.

Well, you can type erase them, but you need an abomination like this, where you combine templates and inheritance, when you could have used inheritance to begin with:

#include <iostream>
#include <vector>
#include <memory>

/* Message class still needs to print() as above, but is now
* an unmaintainable tangle of degeneracy and can't be easily
* extended / modified */

/* Give message an interface */
struct AbstractMessage
{
virtual void print() = 0;
};

/* Wrap all messages constructions behind the interface */
template<typename T>
struct MessageImpl : public AbstractMessage
{
void print() override
{
m_impl.print();
}

/* In the most degenerate case, this might
* need a factory method or receive a complete copy
* of Message<T> directly… */
MessageImpl(T _val) : m_impl(Message<T>{_val}) {}

private:
Message<T> m_impl;
};

/* Now this works. */
void process_message(AbstractMessage& msg)
{
msg.print();
}

int main(int argc, char* argv[])
{
/* For inheritance we need to use pointers */
auto msg_int = std::make_shared<MessageImpl<int>>(420);
auto msg_float = std::make_shared<MessageImpl<float>>(420.69f);

std::vector<std::shared_ptr<AbstractMessage>> vec = {msg_int,msg_float};
for (auto& msg : vec)
{
process_message(*msg);
}
}

Not a great idea after all… Could have done it right from the beginning:

#include <iostream>
#include <vector>
#include <memory>

struct AbstractMessage
{
virtual void print() = 0;
};

template<typename T>
struct Message : public AbstractMessage
{
const T m_payload;

void print() override
{
std::cout << m_payload << std::endl;
}

/* Not a POD anymore, not trivially constructible… */
Message(T _arg) : m_payload(_arg) {}
};

void process_message(AbstractMessage& msg)
{
msg.print();
}

int main(int argc, char* argv[])
{
std::shared_ptr<AbstractMessage> msg_int = std::make_shared<Message<int>>(420);
std::shared_ptr<AbstractMessage> msg_float = std::make_shared<Message<float>>(420.69f);

std::vector<std::shared_ptr<AbstractMessage>> vec = {msg_int,msg_float};
for (auto& msg : vec)
{
process_message(*msg);
}
}

But of course, interfaces need to be identified before the code is in production, which is not always trivial.

Jinja2

Now… what does Jinja2 have to do with C++ at all?

Well, you could be finding yourself in an interesting case where you want to have the comfort of type safety, but also the flexibility of not having to maintain hundreds of types / factories in your headers by hand.

Let’s go back to the macro case, where you want to generate a set of objects to use at runtime (in that case, strings) that depend in some way on static types (in that case, the name of the type). Let’s say you don’t have, or don’t want to use, reflection: it’s best for you to keep every type explicit and defined at compile time, to minimize the chances of runtime errors or complexity in your runtime code. But then, you get a lot of boilerplate.

Consider the message example above, but with a degree of variability that exceeds what templates can express:

#include <iostream>
#include <vector>
#include <memory>

static const unsigned LINE_SIZE = 5;

struct AbstractMessage
{
virtual void print() = 0;
};

/* Just some utility function */
template <typename T>
inline void print_vector(std::vector<T> vec)
{
unsigned print_cnt = 0;
bool first = true;
for (auto v : vec)
{
std::cout << v << " ";
if (++print_cnt % LINE_SIZE == 0 && !first)
{
std::cout << std::endl;
}
first = false;
}
if (print_cnt % LINE_SIZE)
{
std::cout << std::endl;
}
}

struct MessageA : public AbstractMessage
{
/* You want these to be accessed by name,
* e.g. msg.m_int_0, and be public const */
const int m_int_0;
const int m_int_1;
const float m_float;

/* Boring! */
void print() override
{
std::cout
<< m_int_0
<< " "
<< m_int_1
<< " "
<< m_float
<< std::endl;
}

/* Boring! */
MessageA(int _int_0, int _int_1, float _float)
: m_int_0(_int_0),
m_int_1(_int_1),
m_float(_float)
{}
};

struct MessageB : public AbstractMessage
{
const std::vector<int> m_vec_int;
const std::vector<float> m_vec_float;

/* Boring! */
void print() override
{
print_vector<int>(m_vec_int);
print_vector<float>(m_vec_float);
}

/* Boring! */
MessageB( std::vector<int> _vec_int,
std::vector<float> _vec_float )
: m_vec_int(_vec_int),
m_vec_float(_vec_float)
{}
};

void process_message(AbstractMessage& msg)
{
msg.print();
}

int main(int argc, char* argv[])
{
std::shared_ptr<AbstractMessage> msg_int =
std::make_shared<MessageA>(420,777,69.0f);
std::shared_ptr<AbstractMessage> msg_float =
std::make_shared<MessageB>(
std::vector<int>({1,2,3,4,5,6,7}),
std::vector<float>({7.0f,6.1f,5.2f,4.3f,3.4f,2.5f,1.6f,0.7f}) );

std::cout << std::dynamic_pointer_cast<MessageA>(msg_int)->m_int_0 << std::endl;
std::cout << std::dynamic_pointer_cast<MessageB>(msg_float)->m_vec_float[0] << std::endl;
}

So much boilerplate! Look at that main() function. Those lines are so long! Maybe some factories could help…

#include <stdexcept>

/* Same classes as before*/

/* Wrapper class with convenient factories and cast methods */
class PolymorphicMessage : public AbstractMessage
{
public:

void print() override
{
m_msg->print();
}

template<typename T, typename... Args>
static PolymorphicMessage make(Args... args)
{
auto wrapped = std::make_shared<T>(std::forward<Args>(args)...);
return PolymorphicMessage(wrapped);
}

template<typename T>
std::shared_ptr<T> as()
{
auto ptr = std::dynamic_pointer_cast<T>(m_msg);
if (ptr.get() == nullptr)
{
throw std::logic_error("Invalid cast!");
}
return ptr;
}

private:
PolymorphicMessage(std::shared_ptr<AbstractMessage> _msg)
: m_msg(_msg)
{}
std::shared_ptr<AbstractMessage> m_msg;

};


int main(int argc, char *argv[])
{
auto msg_int = PolymorphicMessage::make<MessageA>(420, 777, 69.0f);
auto msg_float = PolymorphicMessage::make<MessageB>(
/* std::initializer_list is needed to infer the template arguments types */
std::initializer_list<int>{1, 2, 3, 4, 5, 6, 7},
std::initializer_list<float>{7.0f, 6.1f, 5.2f, 4.3f, 3.4f, 2.5f, 1.6f, 0.7f});

std::cout << msg_int.as<MessageA>()->m_int_0 << std::endl;
std::cout << msg_float.as<MessageB>()->m_vec_float[0] << std::endl;
// msg_float.as<MessageA>() // Fails predictably!

process_message(msg_int);
}

main() looks better now, but we still have so many classes, members and print() functions to maintain!

What if maybe…

  • You have a schema for those messages already.
  • You would like to query what types are there
  • You would like to query who uses those types
  • You just don’t want to let your team write those message classes by hand, after all!

Then maybe, maybe… you found a case for making your own domain specific language.

You can specify your types in a declarative language, like JSON:

{
"types": [
{
"name": "MessageA",
"payload": [
{
"name": "int0",
"type": "int"
},
{
"name": "int1",
"type": "int"
},
{
"name": "float",
"type": "float"
}
]
},
{
"name": "MessageB",
"payload": [
{
"name": "vec_int",
"type": "int",
"is_vector": true
},
{
"name": "vec_float",
"type": "float",
"is_vector": true
}
]
}
]
}

Then, you can generate your types with Python + Jinja2.

Jinja2 template (could use some syntax highlighting support!):

#pragma once

#include "common.hpp"

#include <vector>
#include <string>
#include <iostream>
#include <sstream>

{%- macro prepend_comma(idx) -%}
{%- if idx != 1 -%}, {% endif -%}
{%- endmacro -%}

{%- macro make_constructor_arguments(payload) -%}
{%- for attr in payload -%}
{{prepend_comma(loop.index)}}
{%- if attr.is_vector -%}
std::vector<{{attr.type}}>
{%- else -%}
{{attr.type}}
{%- endif %} _{{attr.name}}
{%- endfor -%}
{%- endmacro %}

template <typename T>
inline std::string print_vector(std::vector<T> vec)
{
static const unsigned LINE_SIZE = 5;

unsigned print_cnt = 0;
bool first = true;
std::ostringstream output;
for (auto v : vec)
{
output << v << " ";
if (++print_cnt % LINE_SIZE == 0 && !first)
{
output << std::endl;
}
first = false;
}
if (print_cnt % LINE_SIZE)
{
output << std::endl;
}
return output.str();
}

{% for msgtype in types -%}
struct Message{{msgtype.name}} : public AbstractMessage
{
{%- for attr in msgtype.payload %}
{% if attr.is_vector -%}
const std::vector<{{attr.type}}> m_{{attr.name}};
{%- else -%}
const {{attr.type}} m_{{attr.name}};
{%- endif -%}
{%- endfor %}

void print() override
{
std::cout
{%- for attr in msgtype.payload %}
<< "{{attr.name}}: "
{%- if attr.is_vector %}
<< std::endl
<< print_vector<{{attr.type}}>(m_{{attr.name}})
{%- else %}
<< m_{{attr.name}}
<< std::endl
{%- endif -%}
{%- endfor -%};
std::cout << std::endl;
}

Message{{msgtype.name}}(
{{ make_constructor_arguments(msgtype.payload) }}
)
: {% for attr in msgtype.payload -%}{{prepend_comma(loop.index)}}m_{{attr.name}}(_{{attr.name}}){%- endfor %}
{}

};

{% endfor -%}

It is admittedly very hard to read Jinja2 code without being able to run it, but I can assure, once you have syntax highlighting and the code output, it’s quite intuitive and flexible to use.

To generate, your code, use this Python snippet:

#!/bin/env python3

import jinja2
import json

env = jinja2.Environment(
loader=jinja2.FileSystemLoader(searchpath="./")
)
msg_template = env.get_template("messages.jinja2")

schema = dict()
with open("schema.json","r") as jsonfile:
schema = json.load(jsonfile)

rendered_header = msg_template.render(schema)
with open("messages.hpp","w") as outfile:
print(rendered_header,file=outfile)

Find the code for this example in this repo.

Then, for example, you could make another schema for the message users and specify the ports they write to, which messages those support, and generate factories to connect users to port at runtime! Then, your users would get compile time safety when sending messages through your framework.

Conclusion

It’s not easy to understand your domain before you start writing code, but making sure code stays short, simple and expressive is of paramount importance in every codebase.

There are different reasons to use different types of code generation; as a best practice, one should steer away of solutions that limit future flexibility (especially copying code and macros!).

Templates should be used carefully and one should make sure things that are related can be treated as such.

Finally, when there is the necessity to manage a large set of different but somewhat related types, with the requirement of compile time safety, it’s worth considering using a template engine to generate your headers from a declarative schema (for example, in JSON).

You can find the code in https://github.com/infinitesnow/code_generation_article.

I hope you find this article helpful! Please leave feedback and suggestions if you wish.

P.S.: While writing this article, I found an exception that excuses copy pasting code… can you guess? 😄

--

--