Okio.Options

Jerzy Chałupski
5 min readSep 30, 2019

--

There’s a piece of the Okio library I didn’t describe in my Closer look at the okio library post. I skipped it because at that time it didn’t seem that important and it did seem weird. Only when I started reading through the code of another Square’s library did I realize how important that piece is. The class I’m talking about is Options.

Usage

First, you need to create an Options from some ByteStrings:

val fields = Options.of(
ByteString.encodeUtf("depth"),
ByteString.encodeUtf("width"),
ByteString.encodeUtf("height")
)

Then you can query the BufferedSource to see if the next bytes match any of the ByteStrings we passed to the Options. If the match is found, the index of the match is returned and the matching data is consumed from the source, otherwise the -1 is returned and the source stays as it was before the call:

val buffer = Buffer().writeUtf("width=1500,breadth=superwide")buffer.select(fields) // returns 1, matching the first option
buffer.readByte() // skip the '='
buffer.readDecimalLong() // returns 1500
buffer.readByte() // skip the ','
buffer.select(fields) // returns -1, "breadth" was not listed as one of the options we expect

The API seems weird and a bit clunky, and it probably would be both if you’d use it directly in your code you write by hand. It’s not a problem though for a generated code or internals of a reflective calls. And that’s exactly the kind of code in which I stumbled upon Okio.Options: inside the ClassJsonAdapter in Moshi.

It’s probably not a big surprise because Moshi is a JSON library, and when you think about it for a second, finding a match within a limited set of options is something you do all the time during JSON parsing: you need to figure out what to do with the next field in JSON object, which adapter to use to deserialize it, etc. I strongly suspect that this is exactly the use case for which the Options were created in the first place.

But then, why is the Options a part of Okio and not a part of Moshi? The short answer is that because of performance optimizations the Options need access to the internal structure of Okio’s Buffers. The long answer is, let’s take a deep dive into Options’ internals!

Trie

The good data structure for the task at hand is a trie, sometimes called a prefix tree. This data structure can be used to encode the set of words in a memory-efficient way. In such tree the root node is empty and the words are represented as strings of child nodes. If the words start with the same prefix, the nodes representing this prefix are shared between words. Since a word can be a prefix of another, longer word, the nodes also need to carry this information. Here’s the trie with the following set of words: “user”, “username”, “email”, “accounts”, “is_active”, and “is_premium”.

Okio trie

One thing you may notice about this trie is that while we’re traversing it we’re performing one of two actions: we either choose which child branch should we follow, or we’re following a chain of nodes with a single child each. This property is directly reflected in the trie implementation in Okio, which contains two types of nodes:

One type is a “select” node, which contains a list of characters of the child nodes. Each option is pointing either to another node or to a word from the input set.

The other type is a “scan” node, which represents a string of nodes in the simple trie. It points to another node or an input word.

Both types contain also a “node prefix”, which holds the information whether some word from an input set was encountered earlier during traversing the trie (for example when we’ve already traversed the nodes for the word “user” and now we’re checking if that’s just “user” or “username”).

Here’s the Okio representation of the same trie:

Okio trie encoding

If you take a look at the Okio code, you won’t find any ScanNode or SelectNode classes. In fact, the internals of Options class consists of these two fields:

final ByteString[] byteStrings;
final int[] trie;

The first one is the set of the input words, the second, a more surprising one, is a trie encoded as an IntArray. The encoding format is described in details in javadocs for a private Options.buildTrieRecursive method. This representation is very memory and CPU efficient, but it’s not the most pleasant representation to work with. Fortunately, these details are fully encapsulated in Okio and, as the saying goes, “what happens in the library, stays in the library”.

The last ounce of performance is squeezed out by keeping this class within Okio. The Options class might need to scan more data than it will consume, and this kind of operation requires either creating a snapshot of the Buffer or accessing Buffer internals. The former option is prohibitively expensive. The second option has two possible implementations: embedding the Options within Okio or using the UnsafeCursor API. The first implementation is probably a tad more efficient, so that’s what the Square devs did.

Summary

I guess that the lesson for today is that high-performance, low-allocation code requires some funky tricks. I find this kind of tricks fascinating and enjoy reading through the code that makes it tick. On the other hand, I realize that the chances that you’ll ever need to use the technique presented in this post are very slim. So think of it as this weird screwdriver that lies at the bottom of your toolbox. You’ll use it maybe twice in your lifetime, but on these occasions, it’s going to be a godsend.

--

--