What’s all the buff about PROTOBUF

4 min readSep 8, 2018

The most crucial thing on which or for which every thing one depends today is “data”. And more crucial than that is , how to share that data . Whether we share the data in-person or manual transmission or electronic transfer or machine to machine communication, every way has to be efficient and convenient. More faster and easier we can communicate, better will be our co-ordination and understanding with the surroundings.

In this article, I will mainly focus on the machine to machine communication. Service oriented architecture has been a buzz word for quite sometime now.In SOA, basically services talk to each other to solve one common problem. At the root of all these communications, it’s sharing of the data. Faster the data can be shared, better that system architecture is ! We have seen the transfer of data getting evolved over time CSV -> Serialized data -> XML -> JSON. And the latest entry into this is Protobuf.

I feel, before starting to use any new buzzword in IT namespace, we need to have clarity on three things : WHY ? WHAT ? and HOW ? . Trust me, this not just helps you to be only a better developer, but helps you in cracking next interviews as well :P

So, let’s jump to the WHY part …. Why Protobuf ?

We have been using some form or way to transfer data. Most common of those forms are CSV. CSV offers very simple and easy to use data format. It’s very handy to easily wrap the data in CSV format and share with other systems.But using CSV has it’s own challenges :

It’s difficult to infer the type of data
Headers may not be available all the time
When the data itself includes the comma values in it

Another popular way was to serialize the data and transfer over wire. Some of the languages like Java, had an inbuilt support for serialization and worked amazing for them. But again, interoperability became a challenge for them. For ex: An object serialized by java, can be deserialized and understood by another Java application only.

To it’s escape came XML data format for sharing the data. It’s tree like structure had the human readable form, and can be parsed by almost every language. This format of data ruled the market for quite long time and is still being used at many firms for the passing of information. However xml, too had it’s limitations. Parsing an XML file is a time intensive task. Understanding the DOM structure of a dense XML file, consumes a lot of time for any service.

With the popularity of javascript growing both on the browser as well as server side, JSON got pushed as the next major language neutral data transfer medium. It was light weight and was supposed to solve all the problems which existing data transfer forms had. It got further boost with REST ruling the SOA, using JSON as the standard way of data communication/transfer. But JSON too, tasted the test of time. It too had it’s own challenges :

When the payload is big, it gets difficult to transfer the data. In JSON we pass key-value pair all the time. Presence of so many key value pairs make JSON inherently heavy.
Also making any new change makes thing difficult. One has to take a lot of effort in maintaining backward compatibility

So, protobuf comes to rescue for all these problems. We will see in the next WHAT section to understand how it solves the above all problems

What is Protobuf ?

Protocol buffers or protobufs was introduced by Google. They use this protocol for communication in almost all of their internal services.

In the very simple terms, it provides the fixed schema, which can be binary encoded and then can be understood by any language. It’s inherently light weight. A sample message schema may look like below

syntax="proto3"
message Person {
  required int32 id = 1;
  required string name = 2;
  optional string email = 3;
}

It offers very flexible schema, with type checks.
Backward compatibility comes for free. Since each of the attribute in the message is tagged to a number, it ensures out of box backwardcompatibility
It has it’s own validation checks at the message level itself and saves us from writing code for that.
No need to write heavy boiler plate code for understanding the schema. It gets autogenerated wit the compiler available in almost every language for the .proto file.
Language interoperability. Two systems can talk to each other irrespective of their implementing language.
This message gets binary encoded and are inherently super light. Also with the tags available, there’s no need to keep the key information in the binary file.

However, still there are some use cases when JSON makes more sense :

You want/need the data to be human readable.
Data has to be transferred only between browser and server.
You are not prepared to tie the data model to a schema.
You don’t have the bandwidth to add another tool to your arsenal.

In my next article, I will provide the implementation details of Protobuf in both Java and Python. Please stay tuned ……

What’s all the buff about PROTOBUF

Written by Vishwa Mohan