Speeding up incoming message parsing by 3- to 10-times by switching from serde to the nom library

Juraj Selep
TezEdge
Published in
9 min readJun 16, 2021

Computer systems are designed to process information in the form of a binary stream of data. These strings consist of numbers and letters that must be processed by parsers into an intelligible form that the system can use.

The same is true with data sent across the Tezos network. For example, with a ConnectionMessage we have port number, nonce, public key and proof of work stamp all in the form of one continuous string. To make sense of this, we must separate and decode the various pieces of information from this string of data, for which we use a parser.

Parsing is a necessary process in the TezEdge node, but it also introduces overhead in the form of additional code and memory. We want to minimize this overhead in order to make the process as efficient and secure as possible.

The serde-based encoding is the de facto standard for Rust and it has its own layout for correspondence between serialized bytes and Rust structures (encoding and decoding). However, the Tezos encoding we use has a different layout.

To use the serde library with this Tezos encoding, we had to introduce an intermediate layer to store generic data that we either obtained from binary strings or Rust structure. To maintain that intermediate layer required more CPU and more memory. We wanted a more efficient solution, which is why we decided to use a different library.

The nom library was chosen because it provides the tools to build safe parsers without having to sacrifice speed or increase memory consumption. Additionally, nom is very robust in building arbitrary parsers.

By using nom, we no longer need an intermediate layer that was required with serde. This reduces the amount of code that could potentially introduce problems and improves the security and stability of the parsing process.

We also specify the encoding schema — the description of how Rust data is encoded and decoded. Previously we used to specify it as a separate declaration. Now it is specified as attributes on the Rust data structures themselves.

When we receive messages from the network in the form of Tezos binary encoding (as a stream of bytes), we then create Rust structures from them.

For example, we have a Rust structure — we need to read each of these fields. Having nom-based functions to decode each of the fields it is quite easy to combine them into a sequence using nom’s seq function and then map the sequence of parsed values to the structure instance using nom’s map function. In the same way, we can use nom’s parsers as building blocks for all Tezos messages.

The general idea

Currently, there is a top-level decoding trait, BinaryMessage, with a single method performing decoding:

The method constructs the specific message from incoming bytes, checking that all bytes are consumed, and return the resulting message, or report an error.

To be able to decode arbitrary data type from a bytes stream, we should have a trait that is capable of that:

A naive approach

The most straightforward approach to using nom would be to just read bytes with nom and build message parts in accordance with encoding.

Although this is straightforward, it requires a lot of routine work of implementing nom_read methods for each type that has an encoding schema. This process is error-prone (e.g. it is quite easy to get the order of fields wrong), and it makes the decoding code independent of the encoding schema definition, which means that both encoding schema and decoder code should be updated synchronously.

A derive-based approach

As we can see, the decoding code strictly follows the encoding schema that is defined via has_encoding macro. The most natural way would be to use that encoding definition to decide which nom construct should be used. For each message, its encoding schema is accessible at run-time via Encoding::encoding() interface, but constructing the message itself could be tricky.

The solution is to use a derived procedural macro to generate the NomReader::nom_read method at compile time. However, that means that the encoding schema should also be available at compilation time, so we need to move the encoding definition from that stand-alone macro to macro attributes:

That way, procedural macro functions will be able to construct the same structure that is currently used to define encoding schema, and both Encoding::encoding() and NomReader::nom_read implementations.

Deriving Encoding from Rust Data

We should define an unambiguous mapping between Rust data declaration (e.g. struct, or enum) and its Tezos encoding.

For some types, like u8 or i16, we can directly generate corresponding Tezos encoding (Uint8 and Int16). When a Rust primitive type is used to represent a different encoding type (e.g. i64 for Timestamp), a special attribute should be used to specify that (e.g. #[encoding(Timestamp)]).

There is a straightforward way to map Rust struct declaration to the corresponding Obj encoding. For each field of the structure, corresponding Field instances will be generated.

For Enum encoding, we should specify the size of the tag that is used to specify which enumeration variant is encoded, and for each Rust enum’s variant the corresponding tag value should also be specified:

Tezos encoding supports bounded versions for some encoding primitives, like string, list and dynamic encoding, that specify the maximum number of elements or bytes. That information also should be specified in the encoding attribute: #[encoding(dynamic = “SIZE”, list)]

For data typed with user-defined type, it is assumed that the type implements the HasEncoding trait, so its encoding schema is already defined. In the case of the type alias that cannot have separate traits implementation, encoding of the corresponding data should be specified using a special attribute (e.g. #[encoding(composite(dynamic, list, string = “SIZE”))])

Deriving Nom Decoding from Rust Data

The way how Rust data is encoded/decoded is fully defined by its Rust declarations along with required encoding attributes, so we just need to find out the proper Nom parsers to generate decoders.

For primitive types nom::number::complete, a module is used, providing parsers like u8 and u16(Endianness::Big) that will decode a byte and a 2-byte number in big-endianness form.

For user-typed data, like with encoding schema case, we assume that the decoding trait is implemented, and we just call its function that will take care of decoding binary data into corresponding values.

For struct data nom’s seq and map parsers can be used, to read each field one by one and then construct the whole struct instance from the sequence of its fields.

For enum, nom’s tag parsers might be used along with alt combinator. However, that means that each alternative will be tried out until one succeeds, and in case of an enum with a lot of alternatives that might not be optimal. Instead, the variant descriptor can be read first and then matched against constant values to decide what variant to parse.

For composite encodings, like list, dynamic etc., special parsers are created that extend the nom library. That way, generating a decoder function for such encoding is very simple, like dynamic(list(…)).

Testing for correctness

The first obvious set of tests would be to check if the derived encoding is exactly the same as the one that we are using right now. It should be pretty straightforward to generate a single test of that kind per each type.

For some transient period we might also use a single entry for decoding (e.g. BinaryMessage) that would call both implementations and assert that the results are the same.

Fuzzing

TezEdge decoders are extensively tested through fuzzing. Existing fuzz tests are now running against the new implementation. See the fuzzing status page here: http://fuzz.tezedge.com/develop/

Additionally, while the old implementation still co-existed with the new one, differential fuzzing has been performed over these two implementations to discover any differences in results decoding the same input data. With this fuzzing we ensure that the results are the same in case of success (or otherwise both signal errors):

Comparing performance with the old implementation

While implementing the new encoding, we developed benchmarks to prove that we will get better performance from it. According to these benchmarks, the new implementation is from 3x to 10x times faster than the old one (for some messages, we also implemented for reference a “raw” decoder that does not use any parser library):

connection-decode-serde time: [1.8282 us 1.8327 us 1.8386 us]connection-decode-nom time: [144.51 ns 175.05 ns 214.00 ns]connection-decode-raw time: [84.692 ns 85.542 ns 86.682 ns]ack-decode-serde time: [4.3648 us 4.3959 us 4.4332 us]ack-decode-nom time: [1.3510 us 1.3603 us 1.3716 us]ack-decode-raw time: [936.27 ns 946.48 ns 959.18 ns]current-branch-decode-serdetime: [106.72 us 107.50 us 108.39 us]current-branch-decode-nomtime: [10.662 us 10.742 us 10.835 us]

These benchmarks are available here: https://github.com/tezedge/tezedge/blob/nom-poc/tezos/messages/benches/decoders_benchmark.rs.

Launch them by typing these commands:

$ git clone https://github.com/tezedge/tezedge.git -b nom-poc$ cd tezedge$ git checkout fa6845$ cargo bench — bench decoders_benchmark

Dynamic Memory Consumption Comparison

In order to measure the difference in memory consumption between the new and old decoder implementations, we developed two simple unit tests that just decode several different messages in a row.One uses the old implementation and the other one uses the new implementation. These unit tests are then run using valgrind’s dhat tool that collects dynamic memory allocations and reports its summary:

Serde-based decoding:Total: 16,181,115 bytes in 30,594 blocksAt t-gmax: 7,877,633 bytes in 8,330 blocksAt t-end: 26,657 bytes in 234 blocksReads: 4,009,081 bytesWrites: 14,397,789 bytesNom-based decoding:Total: 1,475,072 bytes in 8,657 blocksAt t-gmax: 605,404 bytes in 2,601 blocksAt t-end: 0 bytes in 0 blocksReads: 529,888 bytesWrites: 733,391 bytes

To run these tests, enter these commands:

$ git clone https://github.com/tezedge/tezedge.git -b nom-poc$ cd tezedge$ git checkout fa6845$ cargo build — test dhat$ export CARGO_MANIFEST_DIR=tezos/messages$ valgrind — tool=dhat target/debug/deps/dhat-… all_decode_serde$ valgrind — tool=dhat target/debug/deps/dhat-… all_decode_nom

CI Tests

To be sure that in the future we do not introduce extra memory consumption to the message parsing, we added a CI test that uses valgring’s dhat tool to compare the amount of memory allocations made by parsing a sequence of bootstrapping messages grabbed from a real node. If allocation in the new implementation increases by more than 10%, the test fails. Here is the sample build where the error is reported:

http://ci.tezedge.com/tezedge/tezedge/358/1/2

Encoding

For encoding, the same information might be used to generate byte streams based on message instances. Nom’s approach with combining parsers has proven to be very effective in application to Tezos encoding, so a similar approach will be used to generate encoders for Tezos messages.

We thank you for your time and hope that you have enjoyed reading this article. If you have any questions regarding the garbage collection implementation used in the TezEdge node, feel free to contact me. To read more about Tezos and the TezEdge node, subscribe to our Medium, follow us on Twitter or visit our GitHub.

--

--