Protocol Buffers — Just how fast are they

5 min readFeb 16, 2020

Fast.

Sorry to spoil it, but oh man do these messages process fast.

So what are Protocol Buffers?

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data — think XML, but smaller, faster, and simpler. — Google

If you are streaming large data sets, a heavy cost on memory and performance is I/O and the serialization up and down the pipeline. While milliseconds may not seem like much initially, when you are streaming millions of records an hour saving a few nanoseconds here and there reaps large dividends. For this reason, in addition with protobufs following a language agnostic, backward compatible model, it makes for a pretty attractive serialization mechanism.

As an obvious disclaimer — I don’t want to make this out as the be-all end-all serialization solution. Like anything else, the right solution to your problem usually always depends. There are several serialization tools on the market, they are constantly adapting, and they may perform a bit differently depending on your platform, data model, or rate of consumption. But don’t let the paralysis set in either!

Besides additional research, sometimes the best way to see if something is the right fit is to simply try it out. Run a small, isolated experiment and draw real conclusions from your findings. Doing so will give you something tangible to work with, to talk to, along with the experience that comes in working with the product. So let’s do just that.

A Quick Overview

Google offers a straightforward overview and tutorial on how to get things up and running for many of its supported languages. With Protocol buffers they make it pretty simple. The idea is that you compose protobuf files to match your data model, run them against a compiler (protoc), and it will automagically generate some fairly complex code to read and write data to your defined schema.

Here’s an example of what a .proto files looks like. I pulled this straight from Google’s documentation.

When encoded into a binary format, the data is intended to be much faster to parse and take up less disk space. All the while providing a low-risk model for release management with backward and forward compatibility.

After following their documentation and testing out how the objects were built, compiled, and parsed on a simple scale I decided to run a load test to measure protobuf’s serialization performance and how it compared to Java’s native mechanisms. There are plenty of more comprehensive tests out there, but there’s something about doing it yourself that provides that extra nudge of reassurance in what someone is trying to sell you.

What was the test? What were the results?

For my testing purposes, I used the AddressBook class generated by protobuf’s compiler for my data model. It’s part of the same message described above. In a separate text document, I created a list of 10,000 Person messages to parse into a single AddressBook object.

1 Lilly lilly.abraham@email.com email 111-111-1111 mobile
2 Bob bob.johnson@email.com 111-111-1112 home

With this object in hand, I iterated over it 100 times in 4 separate test scenarios:

Java Serialization and Deserialization
Protocol Buffers Serialization and Deserialization

The chart below represents those findings, while also giving visibility to the consistency and reliability of each execution on the same data set.

Java and Protocol Buffer Serialization Performance

Let’s get the boring stuff out of the way first.

The results remained more or less stagnant when serializing and deserializing the same object repeatedly.
And while subtle, it looks like on the first pass, serialization was marginally more time consuming before the object leaked into cache.
It also looks like deserialization is a generally more taxing effort than serialization. In Java, it looks like that margin is pretty wide. At least 700 milliseconds.

So what’s the biggest takeaway.

Without question, serialization using protocol buffers was by far more advantageous. If you look, there were quite a few times that protobuf could hardly make it on the grid — falling into the realm of nanoseconds per message while Java ranged from 250–1000 ms.

Now while the writing is on the wall, let’s take a look at a range of statistical values on the stark comparisons between serialization and deserialization separately.

Deserialization Performance Measurements

Not to beat a dead horse, but protobuf rarely event makes it on the gird.

The most powerful values to take away from this in my opinion are the max and the mode. When developing a solution you’ll usually want to test and work with the worst results in mind as a soft buffer. As there’s a general consensus that when things become more complex and thrown into the wild there’s a greater chance things won’t perform at the level you expect. While the mode gives reassurance that most of the time, your performance will be optimal as shown here.

Even with the reuse of a single file and static data as a baseline - which probably skewed the results closer to the bottom end, abet it marginally — it is still painfully obvious which tool comes out on top.

Protobuf Serialization: 45 times faster than Java
Protobuf Deserialization: 165 times faster than Java

So if the question was whether you should use Protobuf or Java’s native serialization library to increase performance then I think the answer is pretty clear. The winner is Procol Buffer. Guess I can see why they use it at my workplace now.

Below is the simple Java app I used to capture these findings. Thought I would share in case you wanted to see the results for yourself or wanted to extend it’s behavior to compare against another serialization model.

meme-plus-plus/protocol-buffer-performance-testing

A simple utility to run and compare performance measurements between Google's Protocol Buffer serialization and Java's…

github.com

For a deeper dive and walk through on capturing these results first hand you can check out my video on the subject.

If there are any questions or you feel I’ve missed something important, please leave me a comment. Thanks!