Compile-time JSON deserialization in C++

Abdul Ghani
11 min readJul 5, 2024

--

(Discussion on HackerNews)

Recently, in writing a webserver, I thought it would be cool to provide some JSON de/serialization utility for my hypothetical users. Along the way I spent many hours missing things I take for granted in other languages but that are apparently absent from C++, primarily static reflection and pattern matching.

But it turns out that despite popular opinion, both of these features are (sort of) present in modern C++ — you just need to know where to look! Here is how you could use similar ideas to deal with JSON in a type-safe manner.

  • Pattern matching in C++
  • Atomic types
  • Compound types
  • Making it constexpr

Pattern matching in C++

The idea of pattern matching is a powerful one. It massively increases the conciseness and safety of your code. Let us take Wolfram Language (~= Mathematica) as one of the neatest demonstrations of this idea:

squareAllReals[x_Real] := x^2; (* x_Head matches stuff with that 'Head' *)
squareAllReals[x : (_List | _Association)] := Map[squareAllReals, x];
squareAllReals[x_] := x; (* x_ matches a single instance of anything *)
squareAllReals[___] := $Failed; (* ___ matches any number of anything at all *)

So we can already say things like

squareAllDoubles[{"string", 2.0, <|"key1" -> "value1", "key2" -> 4.0|>}]
(* -> {"string", 4., <|"key1" -> "value1", "key2" -> 16.|>} *)

This seems very far from C++, but I think it seems further than it really is. You just need to look between the angled brackets.

The core principle driving Mathematica is this: you provide an expression, and then it just keeps matching patterns and applying the associated transformations for as long as it can. Sometimes several patterns could work — Mathematica breaks ties by picking the ‘tightest’ pattern possible.

So to begin to make things relatable, if we give Mathematica the two rules

isVariantV[___] := False;
isVariantV[stdVariant[___]] := True;

and then ask about the expression isVariantV[stdVariant[a,b]] , it will find that both patterns match, but the second pattern is tighter, and we are told the answer we hoped for.

Now how does template specialisation work in C++? If we give the compiler the two definitions (plus the utility)

template <typename... Ts>
struct is_variant : std::false_type {};

template <typename... Ts>
struct is_variant<std::variant<Ts...>> : std::true_type {};

template <typename T>
constexpr bool is_variant_v = is_variant<T>::value;

and then ask it to

static_assert(is_variant_v<std::variant<int, double>>);

it will find that the primary template could match — but luckily, so does the tighter partial specialisation, and the compilation can go on.
With this analogy in mind, we are ready to handle JSON!

The aim

JSON is in its pure form anarchic and we would like to structure it automatically as it enters our codebase. As a consequence, we will only deal with a restricted subset of JSON (where e.g., arrays are homogeneous). In practice this sort of restriction is hardly restrictive at all.

For example, we may say that a user has a name and an age, a list of users is a list of users, and a user group is a list of users plus some nullable exclusive flag.
It’s not super hard to parse such a thing, but requirements change — the exclusive flag is no longer nullable, even if that is a breaking change, and now each user has a list of active feature toggles — we want to avoid rewriting the parser each time this happens. I would rather say something like

using User = JSON<Pair<"name", std::string>, Pair<"age", double>>;
using UserList = JSON<ListOf<User>>;
using UserGroup
= JSON<Pair<"exclusive", Nullable<bool>>, Pair<"users", UserList>>;

and have the compiler generate the parser, so I can just say

std::string_view responseBody {R"(
[
{
"exclusive": false,
"users": [
{"name": "toddler", "age": 2},
{"name": "baby", "age": 1}
]
},
{
"exclusive": true,
"users": [
{"name": "user1", "age": 30},
{"name": "user2", "age": 25}
]
},
{
"exclusive": null,
"users": []
}
]
)"};
JSON<ListOf<UserGroup>> userGroups {responseBody};

and have it just work. I also want this to be type-safe. Unsurprisingly this is possible — what is more surprising is that, with only a little contortion, it is possible to statically assert on JSON.

Atomic types

First, humble beginnings:

template <typename... Ts>
struct JSON {};

This is the template which from which we will specialise. A big pro and a big con of template specialisation is that different specialisations are only nominally related, so we will introduce a base class to capture some common boilerplate. Here is the big picture of the strategy:

  • Each specialisation has a ContentType , which is the ‘value’ of the JSON type. For example, a bool stores a bool, a string a std::string, an array of strings stores a std::vector of std::strings, and a general object contains a generally heterogeneous std::tuple of its values (the keys are considered part of the type).
  • Each specialisation will also provide a static function ContentType consumeFromJSON(std::string_view&) , which peels off from the front of the provided string view the value stored in the returned JSON type.

A skeleton of JSONBase could be

template <typename Me, typename ContentType>
requires std::is_default_constructible_v<ContentType>
struct JSONBase
{
ContentType contents;

JSONBase() {};
JSONBase(std::string_view& str): contents{ Me::consumeFromJSON(str) } {}
//some other constructors relevant for my usecase...

JSONBase(const ContentType& other) {
contents = other;
}
JSONBase(ContentType&& other) {
contents = std::move(other);
}
//operator=, operator==, ...

operator ContentType() {
return contents;
}
};

The details of this class are mostly boring convenience and boilerplate, but it’s worth pointing out that

  • The CRTP is close to deprecated by deducing this, but not yet completely — there is still at least the unlikely case where you want to inherit constructors which call a function you provide, and
  • operator T() for T a template type is pretty cool!

Let us quickly agree on what we mean by JSON before we go on — JSON is either

  • Some atomic thing: a double, a string, a boolean, or null, or
  • Some compound thing: a list of JSON, or an object of string values each pointing to yet more JSON.

We can codify in a concept what we mean by ‘some atomic thing’:

struct Null {};

template <typename... Ts>
struct Member;
template <typename T, typename... Ts>
struct Member<T, std::variant<Ts...>>:
std::disjunction<std::is_same<T, Ts>...> {};

template <typename T>
concept JSONAtom =
Member<T, std::variant<std::string, double, bool, Null>>::value;

Fold expressions are a lot like FP-style maps.
You can read std::is_same<T, Ts>… as std::is_same<T, #>& /@ {Ts...} kind of like Mathematica and C++ had a child.
Anyway now we can provide a preliminary partial specialisation for these atomic types:

template <JSONAtom ContentType>
class JSON<ContentType>: public JSONBase<JSON<ContentType>, ContentType> {
public:
using JSONBase<JSON<ContentType>, ContentType>::JSONBase;
using JSONBase<JSON<ContentType>, ContentType>::operator=;

static ContentType consumeFromJSON(std::string_view&) = delete;
std::string toString() const = delete;
}

In C++, you need to explicitly inherit base class constructors — and for good reason — but you also need to explicitly inherit operator= , because your otherwise autogenerated version would shadow the base class version. Now for each atomic type, we only need to specialise consumeFromJSONand toString, for example:

template <>
inline bool JSON<bool>::consumeFromJSON(std::string_view& json) {
stripWhitespace(json);
//substr does bounds checking
if (json.substr(0, 4) == "true") {
json.remove_prefix(4);
return true;
}
else if (json.substr(0, 5) == "false") {
json.remove_prefix(5);
return false;
}
else throw HTTPException(UNPROCESSABLE_ENTITY, "expected boolean");
};

template <>
inline std::string JSON<bool>::toString() const {
if (contents) return "true";
return "false";
}

(I won’t go into toString further, because it wont be constexpr, and you can imagine how it works already.)

Compound types

Before we move onto the compound types properly, lets introduce a utility. Our previous JSON<double> type is different (I think necessarily) from the classic double, but our types are already looking verbose, and I don’t want to write JSON<ListOf<JSON<double>>> too often. So lets write a small tool that lets us treat double like JSON<double> :

template <typename... Ts>
struct IdempotentJSONTag {
using type = JSON<Ts...>;
};

template <typename... Ts>
struct IdempotentJSONTag<JSON<Ts...>> {
using type = JSON<Ts...>;
};

template <typename... Ts>
using IdempotentJSONTag_t = typename IdempotentJSONTag<Ts...>::type;

If we are not a JSONed type, we match the primary case and become a JSONed type, and otherwise we match the other case and stay as we are. This smells a lot like pattern matching to me!

Arrays.
As suggested already we can use ListOf as a tag for the compiler. So, our class will look like JSON<ListOf<T>> , and starts off like

template <typename T>
struct ListOf {};

template <typename T>
struct JSON<ListOf<T>>:
public JSONBase<JSON<ListOf<T>>, std::vector<IdempotentJSONTag_t<T>>> {
using IdempotentJSONType = IdempotentJSONTag_t<T>;
using VectorType = std::vector<IdempotentJSONType>;
...
}

consumeFromJSON basically just looks for commas, recursively passing the stuff in between to the constructor of the type IdempotentJSONType via emplace_back:

static VectorType consumeFromJSON(std::string_view& json) {
...
while (json.size() != 0 && json[0] != ']') {
parsedContents.emplace_back(json);
stripWhitespace(json);

if (json.size() == 0) {
throw HTTPException(UNPROCESSABLE_ENTITY, "expected , in array");
}
else if (json[0] == ',') json.remove_prefix(1);
}
...
return parsedContents;
}

We index it by indexing contents:

IdempotentJSONType& operator[](size_t index) {
return this->contents[index];
}

Objects.
We will specify an object by specifying its keys and the types of their values. To get something like strings as non-type template parameters, we can use a simple wrapper struct:

template <size_t N>
struct StringLiteral {
char contents[N] = {};

StringLiteral(const char(&str)[N]) {
std::copy_n(str, N - 1, contents);
contents[N - 1] = '\0';
}

operator std::string_view() const { return std::string_view(contents); }
};

We specify the contents of our object by writing, like above, JSON<Pair<”name”, std::string>, Pair<”age”, double>> . Our tuple will be a tuple of std::optionals in order to deal with undefined entries. So the ‘variables’ are the contents of the pairs, and our compiler is able to infer them by amazingly ‘inverting’ a fold:

template <StringLiteral... Ks, typename... Vs>
class JSON<Pair<Ks, Vs>...>: public JSONBase<JSON<Pair<Ks, Vs>...>, std::tuple<std::optional<IdempotentJSONTag_t<Vs>>...>> {
public:
using ValueTupleType = std::tuple<std::optional<IdempotentJSONTag_t<Vs>>...>;
static constexpr std::array<std::string_view, sizeof...(Ks)> keys = { Ks.contents... };
...
}

Note, as a consequence, our type is ordered where JSON would not be, in the sense that if we reorder the pairs we have a different type— but we will be able to parse the JSON in any order, so if we define our types once it shouldn’t matter. It’s like how std::variant<int, double>is different from if you flipped it.

Parsing works like this: we find a key, we find a corresponding value, we find the index of the key (in Ks…, conveniently folded out into the array keys), in order to set the corresponding tuple element to the result of the parsing by the corresponding type:

template <size_t index = 0> [[maybe_unused]]
static bool set(ValueTupleType& tuple, std::string_view key, std::string_view& value) {
if constexpr (index == sizeof...(Ks)) return false;
else if (keys[index] == key) {
std::get<index>(tuple) = std::tuple_element_t<index, ValueTupleType>{value};
return true;
}
else return set<index + 1>(tuple, key, value);
}

We have sustained a fairly major injury in our weird battle to make this constexpr — a linear scan through the keys every time — there are such things as compile-time hashmaps, but for now we move on.

Indexing the object.
This turned out to be tricky and I am not 100% happy with the solution I landed on. We could follow the std::get approach like

template <StringLiteral str>
auto& get() {
constexpr size_t index = findIndex<str>();
return std::get<index>(this->contents);
}
...
template <StringLiteral key>
static constexpr size_t findIndex() {
for (size_t i = 0; i < sizeof...(Ks); ++i) {
if (key.contents == keys[i]) return i;
}
throw std::out_of_range("unknown key");
}

this time less worried about the linear for loop, but now we have to write myJSON.get<"myKey">() which feels like… there must be room to do better? Intuitively you would index the JSON with operator[] . However, what should be the return type of auto& operator[](std::string_view&) ? It would need to depend on the value of the parameter, as different keys return different types, which isn’t a thing in C++. Other JSON libraries deal with this by returning a variant…

What we do instead is build new types so as to overload operator[] bespokely for each key. The StringLiteral above will not work, as for example “hello” and “world” have the same type StringLiteral<6> — but we can use a wrapper type:

template <StringLiteral strlit> struct Key {};

This lets us overload operator[] , almost as desired:

template <StringLiteral str>
constexpr auto& operator[](const Key<str>&) {
constexpr size_t index = findIndex<str>();
return std::get<index>(this->contents);
}

I say almost, because now we have to say something like myJSON[Key<”myKey”>{}] . Still, it’s nice not to have to add a .toDouble() at the end.

Nullable types.
Our nullable type-modifier is quite simple — we just store a variant between the given object and JSON<Null>, defined along above with our other atomic types, delegating the parsing to the nested type when appropriate.

template <typename T>
struct Nullable {};

template <typename T>
struct JSON<Nullable<T>>: public JSONBase<JSON<Nullable<T>>, std::variant<IdempotentJSONTag_t<T>, JSON<Null>>> {
using NestedType = IdempotentJSONTag_t<T>;

static std::variant<NestedType, JSON<Null>> consumeFromJSON(std::string_view& str) {
stripWhitespace(str);
if (str.size() == 0 || str[0] != 'n') return NestedType{str};
else return JSON<Null>{str};
}

//operator bool, operator->, etc...
};

We also define some ‘arbitrary’ MapOf JSON type, but we won’t go into it here as there is no obvious way (modulo the compile-time hashmap included above) to make it constexpr . Speaking of which:

Making it constexpr

Actually, we are already very close! A lot of the work left to is just to type constexpr a few times. Is is amazing how much you can do at compile time in C++. However there are two main exceptions:

Making consumeFromJSON pure.
We do a lot of sv.remove_prefix(n) as we consume the JSON, which is constexpr. Still, I had issues getting things to compile as they stand. In the end I made the functions ‘pure’ in the sense that they don’t modify the JSON directly, instead they take and return the current position of the parsing:

template <>
constexpr inline std::pair<bool, size_t> JSON<bool>::consumeFromJSON(const std::string_view& json, size_t from) {
from = stripWhitespace(json, from);
//substr does bounds checking
if (json.substr(from, 4) == "true") {
return {true, from + 4};
}
else if (json.substr(from, 5) == "false") {
return {false, from + 5};
}
else fail("expected boolean");
};

The use of std::vector.
As of C++20, std::vector is constexpr!
… but not in the way we have used it. Essentially, the memory allocated for a compile-time anything is not allowed to survive into runtime. This doesn’t hold for our JSON<ListOf<… above, which holds a vector and could be used at runtime if we wanted. So what to do?

My solution was to parse things lazily. In retrospect, you could probably get a constexpr size_t counting the number of commas in one pass and then use that for the size of an array. But here is the lazy way:

When we are first given a string, we don’t do anything apart from check the balance (using a transient, and thus allowed, std::vector(funnily the stack container wrapper is not yet constexpr)) and save it for later use:

constexpr static std::pair<JSON<ListOf<IdempotentJSONType>>, size_t> consumeFromJSON(const std::string_view& incoming, size_t from) {
from = stripWhitespace(incoming, from);
if (from >= incoming.size() || incoming[from] != '[') {
fail("expected array beginning with '['");
}
size_t closingBracket = from + 1;
std::vector<char> stack {};
stack.push_back('[');
while (!stack.empty()) {
if (closingBracket >= incoming.size()) {
fail("didn't find a closing bracket");
}
else if (incoming[closingBracket] == '"') {
...
}
else if (incoming[closingBracket] == '[' || incoming[closingBracket] == '{') {
stack.push_back(incoming[closingBracket]);
}
else if (incoming[closingBracket] == stack.back() + 2) {
stack.pop_back();
}
++closingBracket;
}

return {incoming.substr(from, closingBracket - from + 1), closingBracket + 1};
}

Now, if we are asked at compile time for the third (zero-indexed) element, we count three commas, extend a string view between the third and fourth commas, and pass that string view for parsing by the homogeneous array type:

constexpr IdempotentJSONType operator[](size_t index) const {
if (contents.size() <= 2) fail("out of range");

size_t commas = 0;
auto findComma = [this](size_t head) -> size_t {
size_t balance = 1;
...
if (head >= contents.size()) return std::string_view::npos;
return head;
};

int head = stripWhitespace(contents, 1);
while (commas < index) {
head = findComma(head);
...
}

size_t nextComma = findComma(head);
if (nextComma == std::string_view::npos) {
nextComma = contents.size();
}

return IdempotentJSONType::consumeFromJSON(contents.substr(head, nextComma - head), 0).first;
}

And, just like that, we are finished!

using UserGroup = JSON<Pair<"exclusive", Nullable<bool>>, Pair<"users", UserList>>;
constexpr std::string_view responseBody {R"(
[
{
"exclusive": false,
"users": [
{"name": "toddler", "age": 2},
{"name": "baby", "age": 1}
]
},
{
"exclusive": true,
"users": [
{"name": "user1", "age": 30},
{"name": "user2", "age": 25}
]
},
{
"exclusive": null,
"users": []
}
]
)"};

constexpr JSON<ListOf<UserGroup>> userGroups {responseBody};

static_assert(!(*userGroups[0][Key<"exclusive">{}]));
static_assert((*userGroups[1][Key<"users">{}])[0][Key<"name">{}] == "user1");

Thanks for reading, and please let me know if you see any improvements or issues!

--

--