Protobuf : What? and Why?

Banujan Balendrakumar
SLIIT FOSS Community
4 min readNov 7, 2021
Protobuf

Heyo ✋,

After the JSON, The data representation and interchange became pretty easy and clean. Even though we had XML before, JSON lets us get rid of tag names, which reduces the file size significantly when comes to large data sets. But today we are going to have a new thing called Protocol Buffers aka Protobuff that beats JSON in a lot of places.

In this story, I am going to give an intro to Protobuf and We will be generating and comparing the JSON and Protobuf data in a simple Python app. Let’s jump into the topic 🚀.

⚡️ What are Protocol Buffers?

Protobuf is a mechanism to serialize structured data developed by Google. Unlike XML/JSON, Protobuf stores data in binary that is small, fast and simple. Also, it’s language-neutral and platform-neutral. That means you can use it across any language/platform.

🙄 How does Protobuf work?

As I mentioned before, Protobuf stores serialized data in binary format. So, Obviously, it needs a compiler to do that. Protobuf compiler is written in C++ and platform-independent so you can install it on any platform.

Once you installed the compiler, You need to define messages. Basically, messages are the definition that describes what data and which format we are going to store (Think a class in OOP). These messages should be stored in a file with the extension .proto

Example: format

message <name> {
<type> <field_name> = <field_number>;
}

.proto files have special syntax to define messages. It has 2 versions proto2 and proto3. For now, proto2 is the default syntax but if you want to use proto3 you need to define it explicitly.

Example: proto2

message User {
int32 id = 1;
string name = 2;
}

Example: proto3

syntax = "proto3";message User {
int32 id = 1;
string name = 2;
}

proto3 still looks the same as proto2 but there are some differences, See the link below.

Once you have defined all messages in proto files, You need to compile them. When you compile proto files, The compiler will generate some class files for the specified language that handles the rest of the operations in your code such as serializing and deserializing data.

Example: compile command

protoc --proto_path={IMPORT_PATH} --java_out={DST_DIR} path/to/file.proto

proto_path = The directory of the .proto files that the compiler needs to look at when trying to resolve the import statement.

java_out = The directory to generate class files for java. If you are using a different language you can use something like cpp_out/java_out/python_out.

⚖️ Comparing JSON and Protobuf

This is a practical part. I am going to create a simple python script to store 100 User Records with id, firstName, lastName, username, and email details. We will be storing data in both Protobuf and JSON format and let’s compare the size it consumes.

Project Structure

|- proto/
|- User.proto
|- proto_code/
|- main.py

proto/ — Directory that contains .proto file(s).
proto_code/ — Directory that contains the generated class files.
main.py — The main python script to execute

1. Create .proto file and define messages

Let’s create the User.proto file inside the proto/ directory.

2. Compile the User.proto file

$ rotoc proto_path=proto --python_out=proto_code proto/User.proto

About command will generate a class file for python which will look like the below,

3. Create Python Programme

This is the main python program. If you don’t know python it’s okay because I am not going to explain everything here, It’s just for giving an idea of how to use Protobuf.

4. Execute and Observe the results

Once I executed the above script. I got 2 new files, One contains Protobuf binary data and another file contains JSON data. Let’s have a side-by-side view of it.

Protobuf vs JSON

Let’s see how much disk space these 2 files consumed.

File Size | Protobuf vs JSON

As you can see, The Protobuf got almost half of the disc space that took by JSON. Researchers say sometimes they could achieve 6 times smaller space than JSON. Because of this, Protobuf binaries can be interchanged over the network faster than JSON. Anyways, now you understood that Protobuf is the really good and best way to store and transfer the data than JSON.

Pros & Cons of Protocol Buffers

Pros

  • Very smaller size
  • Fast Serialization/Deserialization
  • Supports RPC
  • Structure and Type validation

Cons

  • Smaller Community (for now)
  • Lack of Resources (for now)
  • Not human-readable format

Conclusion

Protobuf will really boost applications that handle large data flow. Because of its awesome size compression, It will take less time to exchange over the network. Also, Protobuf data can be shared across any platform and used in any language application.

But, It stores data as a binary. So humans cannot interact with it directly. Because of that, If you want to store any data that can be directly managed by the user such as configurations, settings and etc, You should pick JSON/XML over Protobuf.

Hope you got an idea about Protocol Buffers, Play with Protobuf in your favourite language.

--

--

Banujan Balendrakumar
SLIIT FOSS Community

Senior Software Engineer | AWS Certified Solution Architect | Auth0 Ambassador