json11 bindings for Lua: behind the scenes

JSON document example

Recently I’ve published a github repo with module that allows to use Dropbox’ json11 library within Lua scripts. Even though one can read the source and understand how it works (the module is very simple), I decided to write this article and explain module’s internals in more detail and why I made these design decisions.

Motivation

In some cases we need a script to export its internal state and import it later. When a Lua scripts handles complex object it uses tables for that. A table can act in two roles: as an array that contains a sequence of values with strict order and its elements are accessed with indices; and as a dictionary, where each value is associated with a key and there is no strict order because in order to access values we use keys instead of indices.

Here is an example of a relatively complex table:

{
menu={
id='file',
value='File',
popup={
menuitem={
{value='New', onclick='CreateNewDoc()'},
{value='Open', onclick='OpenDoc()'},
{value='Close', onclick='CloseDoc()'},
}
}
}
}

As one can see, its structure has much in common with JSON, as a result this format is very convenient for data exchange. That is why I decided to use JSON.

Implementation

JSON supports several data formats and data structures: NULLs, boolean values, strings, numbers, arrays and objects. In Lua, a table can contain almost everything as a value or a key, the only rule that nil could not be neither a value nor a key.

There are multiple implementations of JSON parsers/serializers for different languages. Lua also has much. I decided to create my own module just because I can. I wanted my module to be a native module, not pure Lua.

I didn’t want to write my own JSON parser/serializer so that I decided to use an existing library for C++ and just convert Lua’s values to C++ objects and than serialize these objects to JSON. Deserialization is performed in reversed order: firstly, parse JSON and restore C++ objects, secondly, convert these objects to Lua’ values. I decided to use json11 released by Dropbox because it has simple API and does not require anything else.

Lua interface

After one compiles the module, a new shared library will be produced. When Lua loads external native module it loads it into memory and then tries to get of a function named like luaopen_<name> where <name> is the name of the module that is being loaded. That is, if we load our module using the following code:

local json = require 'json'

Lua will look for an address of luaopen_json function (we will call this function entry point). This function must also be exported from the library (it must be included in module’s export table). Lua is written in C and it uses C calling convention for modules, so we must indicate that for this function we use C calling convention:

extern "C" {
int luaopen_json(lua_State* vm) {
}
}

This is a simple native function that can be made available to Lua scripts. Just like any other function of such a type it accepts a single argument, that is a pointer to Lua VM that runs the script that called the function and returns integer value. The returned value is the number of Lua values pushed onto the stack of the VM that was given as argument. A function can return 0 which means that it pushed no values; negative return values seem to have no sense.

When Lua loads a module and calls its entry point function it expects that it returns at most one value (it can return nothing) and this returned value is returned from the require function. Out module has two functions: ToJson that converts the given object to JSON string and FromJson that parses the input string as JSON data and reconstructs Lua values. As a result, the entry point of our module should return a table with two fields: ‘ToJson’ and ‘FromJson’ with functions as values. Here is the body of the luaopen_json function:

static luaL_Reg methods[] = {
{"ToJson", json_tojson},
{"FromJson", json_fromjson},
{nullptr, nullptr}
};

luaL_newlib(vm, methods);
return 1;// 1 value

We create an array of lua_Reg structures, that represent a function associated with its name. The last element of the array that looks as {nullptr, nullptr} is a guard, that prevents Lua from accessing memory that does not belong to us. Using luaL_newlib() we create a new library using the aforementioned array of functions as elements of the library. This function pushes a new table onto the stack of the given Lua VM and fills it with elements of the given array. The table is left on the stack, so we return 1 from the entry point so that Lua takes the value on the top of the stack (the table with functions) and returns it require. As a result, we can call these functions.

From Json

Parsing JSON is the easiest task here, Lua supports all data types that JSON uses, so all we need is to convert json11’s data structures to Lua values. Here is FromJson(str) entrypoint:

int json_fromjson(lua_State* vm) {
auto input = lua_tostring(vm, 1);

do {
std::string err;
auto obj = json11::Json::parse(input,
err,
json11::JsonParse::COMMENTS);
if (!err.empty() && obj.is_null()) { // something went wrong
lua_pushlstring(vm, err.c_str(), err.length());
break;
}

/**
* JSON does not support anything that is not supported by Lua (of data types).
* As a result we are sure that there will be no errors.
*/
return json_push_value(vm, obj);
} while(false);

assert(lua_type(vm, -1) == LUA_TSTRING); // error expected
return lua_error(vm);
}

First of all we ensure we have at least one string argument with lua_tostring(). If function received no arguments or the argument is not a string, it will raise an error. Functions of Lua C API that inform about errors (like lua_error()) never return, they use letump/longjump for that. If we allocate some memory and then call such an error function, then we will never free this memory. In order to overcome this, never call error function in a scope where dynamically allocated objects exist. That is why we use do{} while(false); trick. Instead of a simple scope {}, with do {} while we can explicitly leave it with break. Within our safe environment we deserialize the given string as JSON string and reconstruct json11’s data object. If json11 fails to parse input, it fills the third argument with an error message. We push this message onto the stack and leave the scope with break, destroying the std::string. Just after the scope ends we ensure that the value on the top of the stack is a string and then call lua_error() , that uses the value on the top of the stack as an error message and passes it to a script that called us. If parsing succeeded, we call our function json_push_value() that converts the given object to Lua values and pushes them on the stack.

This function simply pushes everything onto the stack. If the given entity is an array or a JSON object, it creates a table, calling itself recursively to convert array values and dictionary’s keys and values. As one can see, it can tell arrays from objects and handles them accordingly.

To Json

JSON serialization may be a bit tricky, because a JSON document supports only a subset of Lua data types, so we need to verify that the given object contains only values of types that JSON supports. The following Lua types could not be represented as JSON values: functions, threads, userdata and light userdata. These types are usually represented as a pointer to a memory location. We could not simply save a pointer, because the other time we run our script, memory will be different and saved pointers become invalid; we also cannot save an entire object because we usually do not know its internal structure, for example, userdata and light userdata are just pointers to objects that belong to a host application that has created the Lua VM our script is running inside, as a result Lua has no access to real values. As a result, a table could be serialized to JSON iff it contains numbers, strings, boolean values and other tables that, in turn, contain only values of these types. Only string keys are allowed.

Here is ToJson(obj) function available to Lua:

int json_tojson(lua_State* vm) {
luaL_checktype(vm, 1, LUA_TTABLE); // one arg, a table

do {
std::string err;
auto obj = json_tojson(vm, 1, err);
if (!err.empty()) {
lua_pushlstring(vm, err.c_str(), err.length());
break;
}

auto dump = obj.dump();
lua_pushlstring(vm, dump.c_str(), dump.length());
return 1;
} while(false);


assert(lua_type(vm, -1) == LUA_TSTRING); // error expected
return lua_error(vm);
}

This function simply call internal function json_tojson(lua_State*, int, std::string&). This function creates json11::Json object using value at the given index of the stack of the given VM. If something goes wrong, it fills the third argument with an error message. Here we use previously discussed method with do {} while(false) to emulate goto. If the given Lua value could not be represented with JSON, we push the error message onto the stack and leave the scope with break, that will destroy the string and deallocate its memory. At the end of the function we call lua_error() that uses value on the top of the stack (the error message we pushed earlier) as a description of the error. If we successfully converted Lua value to json11::Json, we use the dump method of this class and receive a string with JSON document that represents the original Lua value. Finally, we push this string onto the stack and exit the function returning 1, which means that the returned value(s) is a single value on the top of the stack.

It is important to note, that every JSON object must be either an array or an object. A standalone string or a number is not a valid JSON document. That is why we check that the value the function received is a table.

The json_tojson(lua_State*, int, std:;string&) is itself interesting. Lua uses the same data structures, called tables, for both indexed arrays, where elements are ordered and have integer indices and for dictionaries, where elements have not particular order and are accessed by keys. Given a Lua table it is not possible to determine whether it is an array, or a dictionary without full iteration with the lua_next function. JSON has distinct data types for arrays and dictionaries (in JSON dictionaries are called objects), so we must use the correct type to represent a table depending on its contents. A dictionary-table can have both integer keys (like an array-table) and non-integer keys, like strings, booleans and anything else, except nil. JSON supports only string keys for object values, so if the given table uses as a key something else, we must rise an error.

The solution used by the module is to think that a table is an arrya, and switch to a dictionary if a key of a table is not an integer index. Iterating through a table we get both key and value, if the key is an integer greater than 0 than the table is an array; otherwise it is a dictionary. Using the following code we convert an existing array to an object:

for (auto i = 0u; i < array.size(); ++i) {
auto key = std::to_string(i);
object[key] = std::move(array[i]);
}

By the way, even though in Lua array indices start at 1, in JSON indices start at 0.

It is important that the function not only check key types, but key values too. As I’ve already said, arrays are ordered, so after index 1 there is index 2, and index 3 after index 2. For an array, it is not possible to have indices 3, 1, 2, Meanwhile, a dictionary may contain integral keys, so table {[1]=2, [3]=4} is not an array, it is a dictionary. As a result, if a key is an integer, but its value could not be created by adding 1 to the previous key, then the table is not an array, it is a dictionary. In this case we also convert an array to an object.

It is not possible to covert an object to an array, because objects may contain non-integer keys. As a result, after we converted an array to an object, we do not check integral keys anymore, we only check that a key is a string or could be converted to a string. Pro tip: when iterating a table using lua_next() native API function, never change the key and never use functions that change data type, otherwise the next call to lua_next() will fail. Function lua_isstring() changes value type, converting it to a string if it is convertible. As a result, we have to use expression lua_type() == LUA_TSTRING to check whether the key is a string. When we get the actual value of the key (converting it to a string, we are discussing object with string keys), we must copy it, in order to keep the original value untouched.

For every value of a table we call json_tojson, converting them to json11’s data structures. As a result the module supports tables as values of other tables.

Conclusion

We have discussed internal operations of a module that enables Lua scripts to serialize/deserialize data to/from JSON. I hope this article makes design concepts more clear.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.