A closer look at the Okio library
A common piece of advice given to experienced programmers who want to improve their skills is to read someone else’s code. It’s easier said than done though. The idea of diving into complex codebase is intimidating. This summer, I finally decided to put this advice into practice.
The first task was to figure out which codebase I’d read. I’m an Android developer, so I decided to limit my choice to one of the Square libraries we all love and use in our apps: Retrofit, OkHttp, Moshi, Dagger (even though it’s maintained by Google now, it was created by Square). There’s one more thing that’s common for all these libs besides the company that created them. All of them use Okio under the hood.
Okio is a library that simplifies and simultaneously enriches the usage of InputStream
and OutputStream
classes from the standard java.io
package. I used it a couple of times when writing custom OkHttp
interceptors, but I’ve never really understood how does it work under the hood. All these factors made it a perfect candidate for the codebase to read.
This blog post is a summary of my dive into Okio sources.
Source
, Sink, Buffer
The two Okio interfaces you’ll see all over the library’s API are Source
and Sink
. The names are quite self-descriptive, but just in case you’re having a really bad day, let me tell you that the Source
is something you can read data from (like an InputStream
) and the Sink
is where you push the data (like the OutputStream
).
If you’ve ever worked with java.io
streams, you probably wrapped them in Buffered{Input,Output}Stream
. The main reason is that you very rarely need to read the data byte-by-byte and reading the data in chunks of few kilobytes at a time is much more performant. This observation probably guided Okio’s design, because all the implementations of Source
and Sink
either delegate the calls to some other Sources
or Sinks
, or use Buffers
under the hood.
Buffer
implements both Source
and Sink
interfaces (or, more precisely, it implements BufferedSink
and BufferedSource
interfaces which are like Sources
and Sinks
enriched with a bunch of convenience methods). Unlike the java.io
streams, the Buffer
doesn’t use the fixed byte array to hold the buffered data, it uses Okio Segments
instead. Together, the Buffer
and the Segment
, are the secret sauce of this library.
Segments
A Segment
is your typical data structure for a doubly-linked list: it has a next
and prev
references to other Segments
. It also carries around the actual ByteArray
buffer. No surprises so far, the data has to be kept somewhere. What’s interesting are the two “pointers” called pos
and limit
and two flags: shared
and owner
. More on that later on, first, you need to understand how the Segments
are used by the Buffers
.
Internally the Buffers
keep Segments
as a circularly-linked list. That way you need to maintain only a single Segment
reference to have the access to both ends of the Buffer
for reading and writing the data respectively.
This means that the Buffer
can safely grow without affecting other Sinks
and Sources
piped into it.
Compare that with a BufferedInputStream
and BufferedOutputStream
behaviour, where writing some data might cause a cascade of flushing.
As I mentioned before, the Sinks
and Sources
are often connected into a pipe. Smart folks at Square realized that there’s no need to copy the data between such pipe components like the java.io
buffered streams do. All Sources
and Sinks
use Buffers
under the hood, and Buffers
keep the data in Segments
, so quite often you can just take an entire Segment
from one Buffer
and move it to another.
But what if I don’t want to transfer the whole Segment
? Okio uses another trick to limit the CPU-intensive ByteArray
copying: it shares the data buffer between multiple Segments
. That’s where the pos
and limit
“pointers” come into play. They delimit the data range in a shared ByteArray
used by each Segment
. Going back to our use case: instead of copying the part of the ByteArray
to another one, Okio may create another Segment
backed by the same ByteArray
, which is much lighter operation than copying the data itself.
In the scenario described above, both Segments
are marked with a shared
flag mentioned earlier. This way Okio knows that the data is used by more than one Segment
and that it cannot perform certain operations. For example, you cannot write to the “header” Segment
because you’d overwrite the beginning of the data in the “trailer” Segment
. On the other hand, it should be possible to write past the last used region of the shared ByteArray
, but you need to make sure that only one Segment
is allowed to do so. That’s what the owner
flag is used for. Each ByteArray
used by the Segments
is “owned” by exactly one Segment
.
There’s one more write operation performed on the owner Segments
: data shifting. Imagine that you’re using the trailing half of the ByteArray
and you want to write more data into the Segment
. If the Segment
is not shared, you can move the existing data to the beginning of the ByteArray
and then append new data. If the ByteArray
is shared with other Segments
, you can’t perform this operation as the first half of the ByteArray
might be used by other Segments
.
Keeping all these invariants requires a lot of checks within the Segment
and Buffer
code, but they provide enough performance improvements to justify the added complexity.
Segment pools
There’s one more optimization performed by the Okio: the owner Segments
are pooled to reduce the allocation and zero-initialization of the ByteArrays
. The current implementation has a Segments
cache with a fixed maximum size. It’s implemented as a linked list of Segments
. I find it very cool that the same basic element is used in two slightly different data structures!
Summary
I enjoyed reading Okio’s code. It’s well-documented, the API is very good, and the internals has some interesting twists in it. I also have now a much better understanding of the performance characteristics of different APIs offered by Okio.
Finally, I feel like I’m ready to dive into the next Square’s library that uses Okio under the hood.