Protobuf : What? and Why?
Heyo ✋,
After the JSON, The data representation and interchange became pretty easy and clean. Even though we had XML before, JSON lets us get rid of tag names, which reduces the file size significantly when comes to large data sets. But today we are going to have a new thing called Protocol Buffers aka Protobuff that beats JSON in a lot of places.
In this story, I am going to give an intro to Protobuf and We will be generating and comparing the JSON and Protobuf data in a simple Python app. Let’s jump into the topic 🚀.
⚡️ What are Protocol Buffers?
Protobuf is a mechanism to serialize structured data developed by Google. Unlike XML/JSON, Protobuf stores data in binary that is small, fast and simple. Also, it’s language-neutral and platform-neutral. That means you can use it across any language/platform.
🙄 How does Protobuf work?
As I mentioned before, Protobuf stores serialized data in binary format. So, Obviously, it needs a compiler to do that. Protobuf compiler is written in C++ and platform-independent so you can install it on any platform.
Once you installed the compiler, You need to define messages. Basically, messages are the definition that describes what data and which format we are going to store (Think a class in OOP). These messages should be stored in a file with the extension .proto
Example: format
message <name> {
<type> <field_name> = <field_number>;
}
.proto files have special syntax to define messages. It has 2 versions proto2 and proto3. For now, proto2 is the default syntax but if you want to use proto3 you need to define it explicitly.
Example: proto2
message User {
int32 id = 1;
string name = 2;
}
Example: proto3
syntax = "proto3";message User {
int32 id = 1;
string name = 2;
}
proto3 still looks the same as proto2 but there are some differences, See the link below.
Once you have defined all messages in proto files, You need to compile them. When you compile proto files, The compiler will generate some class files for the specified language that handles the rest of the operations in your code such as serializing and deserializing data.
Example: compile command
protoc --proto_path={IMPORT_PATH} --java_out={DST_DIR} path/to/file.proto
proto_path = The directory of the .proto files that the compiler needs to look at when trying to resolve the import statement.
java_out = The directory to generate class files for java. If you are using a different language you can use something like cpp_out/java_out/python_out.
⚖️ Comparing JSON and Protobuf
This is a practical part. I am going to create a simple python script to store 100 User Records with id, firstName, lastName, username, and email details. We will be storing data in both Protobuf and JSON format and let’s compare the size it consumes.
Project Structure
|- proto/
|- User.proto
|- proto_code/
|- main.py
proto/ — Directory that contains .proto file(s).
proto_code/ — Directory that contains the generated class files.
main.py — The main python script to execute
1. Create .proto file and define messages
Let’s create the User.proto file inside the proto/ directory.
2. Compile the User.proto file
$ rotoc proto_path=proto --python_out=proto_code proto/User.proto
About command will generate a class file for python which will look like the below,
3. Create Python Programme
This is the main python program. If you don’t know python it’s okay because I am not going to explain everything here, It’s just for giving an idea of how to use Protobuf.
4. Execute and Observe the results
Once I executed the above script. I got 2 new files, One contains Protobuf binary data and another file contains JSON data. Let’s have a side-by-side view of it.
Let’s see how much disk space these 2 files consumed.
As you can see, The Protobuf got almost half of the disc space that took by JSON. Researchers say sometimes they could achieve 6 times smaller space than JSON. Because of this, Protobuf binaries can be interchanged over the network faster than JSON. Anyways, now you understood that Protobuf is the really good and best way to store and transfer the data than JSON.
Pros & Cons of Protocol Buffers
Pros
- Very smaller size
- Fast Serialization/Deserialization
- Supports RPC
- Structure and Type validation
Cons
- Smaller Community (for now)
- Lack of Resources (for now)
- Not human-readable format
Conclusion
Protobuf will really boost applications that handle large data flow. Because of its awesome size compression, It will take less time to exchange over the network. Also, Protobuf data can be shared across any platform and used in any language application.
But, It stores data as a binary. So humans cannot interact with it directly. Because of that, If you want to store any data that can be directly managed by the user such as configurations, settings and etc, You should pick JSON/XML over Protobuf.
Hope you got an idea about Protocol Buffers, Play with Protobuf in your favourite language.