5 Simple Binary Encoding Gotchas

tommyqq
4 min readSep 20, 2020

--

Spending hours in frustration debugging Simple Binary Encoding (SBE) issues in your application? You aren’t alone. I’ve been there before. This post hopes to alleviate some of your pains by covering up to 80% of common usage issues related to SBE (I believe).

Code examples: https://github.com/tommyqqt/sbe-gotchas.git

Let’s recap:

SBE is an ultra-fast codec commonly used in low latency financial applications such as FIX engines, pricing engines, etc. This post assumes that you are familiar with the basics.

If you are new to SBE, visit https://github.com/real-logic/simple-binary-encoding

This post refers to the specific SBE implementation in Java (version 1.19.0) developed by Real-Logic. It is not about the SBE FIX standard.

Structure of an SBE message

The same structure Block Fields-Repeating Groups-Var length fields can also be nested in each repeating group.

Fields in an SBE message have to be encoded/decoded sequentially unless the limit is at the beginning of a block whose fixed length members can be accessed randomly.

Now let’s jump to the common gotchas!

1. When encoded length isn’t encoded length

There are times when we would want to know the encoded length of an SBE message, such as sending the message over the wire or persisting it to a file.

How would you get the encoded length of an SBE message? If we just finished encoding the message and we have the encoder on hand, isn’t it just simply calling encoder.encodedLength()? Let’s try it out.

java.lang.IndexOutOfBoundsException: index=262 length=22 capacity=276

We get an exception because encoder.encodedLength() excludes the header length. The whole byte array is required to decode the message, not just the body.

How to determine the encoded length if we only have the encoded buffer? Unfortunately, the decoder has to traverse to the end of the message to get the encoded length. Or, one other way is to remember the encoded length at the time that the message was encoded and pass it along with the encoded buffer as a method parameter.

2. The moving repeating group

One habit that we Java programmers usually adopt is that sometimes when we need to use a value returned by a method call multiple times whereby the value is supposed to stay the same between calls, we then call the method many times, instead of assigning its result to a local variable.

What happen when we try to obtain a reference to the start of a repeating group multiple times like the code below? Note that we haven’t even attempted to traverse the repeating group yet (by calling next()).

Common sense says we should get the sames count and limit both times, but it doesn’t work that way for SBE.

Number of allocations: 2
Current limit: 30
Number of allocations: 20291
Current limit: 34

3. Mutating var length field

What if there is a field whose value want to mutate after we have encoded the message? If we know which field we intended to backtrack later in advance, remember the limit just before encoding it, then use the limit to backtrack later.

Unless the field is a fixed length field, every field subsequent to the mutated field needs to be encoded again.

4. The semi-forbidden schema evolution

Suppose “orderDescription” is a new var-length field that has just been added to the end of the schema like this:

If we are not using that field now, can we not have to change our code to encode/decode that new field? It’s at the end of the message anyway and surely regression test doesn’t pick up anything!

What happens when the code below runs?

It explodes.

java.lang.IndexOutOfBoundsException: index=265 length=926299444 capacity=384

Code that uses SBE also tends to reuse the buffers to reduce allocations. Even though we don’t care about the last field, the buffer may contains some bytes from the previous message that encroaches on the new field when we encode the new message.

5. Debugging and testing with Base64 encoding

How to troubleshoot SBE issues in production?

A couple of ways:

  1. Replay the SBE messages in your test harness (if you employ event sourcing pattern)
  2. Trawl through log files for problematic SBE messages

Let’s talk about (2). Below is a string representation of an SBE message.

[NewOrderSingle](sbeTemplateId=1|sbeSchemaId=1|sbeSchemaVersion=1|sbeBlockLength=18):orderId=ORDER-001|tradeDate=15613|allocations=[(allocAccount=ACCOUNT-1|allocQty=100.0|nestedParties=[(nestedPartyID=Party-1|nestedPartyRole=|nestedPartyDescription='Party-1')]|allocDescription='ALLOCATION WITH ACCOUNT ACCOUNT-1'),(allocAccount=ACCOUNT-2|allocQty=200.0|nestedParties=[(nestedPartyID=Party-2|nestedPartyRole=|nestedPartyDescription='Party-2')]|allocDescription='ALLOCATION WITH ACCOUNT ACCOUNT-2')]|traderDescription='TRADER-0123456789'|orderDescription=''

Besides eyeballing it, is there a way to turn text into SBE bytes (i.e. similar to Protobuf TextFormat parser)? You can write one your own or look hard enough for SBE parsers on the internet. The only problem is that SBE string representation is somewhat arbitrary. It is not a well-defined language like JSON. SBE parsers can stop working if there is an extra pipe or parenthesis somewhere in the text.

Java 8 ‘s Base64 encoding comes to the rescue. We can print the SBE’s bytes as a string anywhere, be it in the log files or even in Junit test cases, and easily reconstruct the bytes later on. No longer need to worry about storing SBE messages as binary files. Yay!

SBE Base64 encoding string:EgABAAEAAQBPUkRFUklELTAwMQAAAAAA/TwYAAIAQUNDT1VOVC0xAAAAAAAAAAAAAAAAAFlAIAABAFBhcnR5LTEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABwAAAFBhcnR5LTEhAAAAQUxMT0NBVElPTiBXSVRIIEFDQ09VTlQgQUNDT1VOVC0xQUNDT1VOVC0yAAAAAAAAAAAAAAAAAGlAIAABAFBhcnR5LTIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABwAAAFBhcnR5LTIhAAAAQUxMT0NBVElPTiBXSVRIIEFDQ09VTlQgQUNDT1VOVC0yCAAAAFRSQURFUi0xFgAAAERVTU1ZIE5FVyBPUkRFUiBTSU5HTEU=

Best of luck on your SBE adventure and don’t forget to share the tips if you think they make your life easier!

--

--