Buffering data with Okio

The motivation behind our new I/O library.

Written by Jesse Wilson.

Heads up, we’ve moved! If you’d like to continue keeping up with the latest technical content from Square please visit us at our new home https://developer.squareup.com/blog

One of the most challenging projects I’ve worked on is OkHttp, Square’s HTTP client for Java. It offers interesting problems for API design (how does the application configure caching?), performance (how long should we hold idle connections?) and code structure (how eagerly do we parse headers?). These decisions interact like Rock-Paper-Scissors:

  • A performance win can cause implementation complexity.
  • Simplifying the internals may limit the API.
  • A friendlier API can prevent certain optimizations.

The most satisfying improvements are the ones that simultaneously speed up the code, simplify the implementation, and improve the API. In this post I’ll describe an ugly situation that led to one of these solutions.

A producer-consumer problem

In recent months we’ve been implementing drafts of http/2, which multiplexes all HTTP messages over a single socket. One thorny challenge is concurrency: we have many application threads but they can’t all do I/O on the same socket simultaneously. Our strategy is to restrict network access to a dedicated producer thread that delivers data to consumer threads.

Our first implementation used a fixed-size, circular byte array as a place for network data to land while waiting for an application thread to consume it. Here’s a (slightly simplified) snapshot of that code, including my original ASCII art:

// Store incoming data bytes in a circular buffer: pos is the first
// byte to // read and limit is the first byte to write.
//
// { - - - X X X X - - - }
// ^ ^
// pos limit
//
// { X X X - - - - X X X }
// ^ ^
// limit pos
//
private final byte[] buffer = new byte[64 * 1024];
private int pos;
private int limit;
void receive(InputStream in, int byteCount) throws IOException {
if (byteCount > buffer.length - available()) {
throw new IOException();
}
  // fill [limit..buffer.length)
if (pos < limit) {
int firstCopyCount = Math.min(byteCount, buffer.length - limit);
Streams.readFully(in, buffer, limit, firstCopyCount);
limit += firstCopyCount;
byteCount -= firstCopyCount;
if (limit == buffer.length) {
limit = 0;
}
}
  // fill [limit..pos)
if (byteCount > 0) {
Streams.readFully(in, buffer, limit, byteCount);
limit += byteCount;
}
}

I was not proud of this code. The fixed size buffer meant that every http/2 connection allocated a 64 KiB buffer, whether it was reading 1 byte or 1 gigabyte. The buffer also imposed that all network data be copied twice, from the network to the buffer, and from the buffer to the application.

All that copying

Imagine our network thread as a mustached mailman. Each delivery is in an InputStream containing a long paper scroll with printed data. Its recipient is a mailbox containing a blank scroll that’s ready to receive that data. The mailman transcribes data from his scroll to the mailbox scroll character-by-character, being careful to wrap around when he reaches the bottom.

It’s a lot of work to copy data from one scroll to another! Our obvious-in-hindsight solution to this problem is to replace scrolls (that were tied to their owners), with pages (that can be exchanged). The mailman removes pages from his bag, adds them to the mailbox, and continues on his route.

Move, don’t copy

We created a new datatype, Buffer. Like ArrayList, you don’t need to size your buffer in advance. Each buffer works like a queue: write data to the end and read it from the front.

Internally, Buffer is implemented as a linked list of segments. When the producer writes to the consumer, it reassigns ownership of the segments rather than copying the data across.

private final Buffer buffer = new Buffer();
void receive(BufferedSource in, long byteCount) throws IOException {
in.require(byteCount);
in.read(buffer, byteCount);
}

By using a Buffer instead of byte arrays, we write less code that runs faster and has a more natural API.

Announcing Okio

Okio is Square’s new open source library that complements java.io and java.nioto make it easier to access, store, and process your data. It includes Buffer and a few other simple-yet-powerful classes for common problems like gzip. The name Okio obeys the trend in Java I/O library names: IO, NIO, and now OKIO. Though it was started in OkHttp, we’re eager to use everywhere it fits.

See the Okio GitHub page for a full walkthrough of the API, code examples, and downloads.