Protobufs : Introduction to binary blob serialization format

Published in

TestVagrant

5 min readAug 28, 2023

Protobufs, just another name to add to the ever-growing list of serialization formats? Or is it something more? Today, we embark on a journey to unlock the hidden wonders of Protobufs, peeling back the layers to reveal its secrets. Behind the scenes of byte-packed structures lies a powerful tool that promises efficiency and flexibility.

So, get ready to dive headfirst into the mystical realm of binary blobs, and discover why understanding Protobufs might just be your next enchanting endeavour.

Protocol buffers, also known as Protobufs, were developed by Google for internal use. It serves as a basis for its custom remote procedure call (RPC) system that is used for nearly all inter-machine communication at Google.

What are Protocol Buffers?

As per the Google Protobuf Documentation,

Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data — think XML, but smaller, faster, and simpler.

In simple words, Protobuf is a data serialization format that uses a binary encoding scheme, resulting in compact serialized messages, and is forward and backwards-compatible. Since the serialized output is a sequence of bytes, it is closely packed thereby taking less space and when transferred over wire protocol, it is much quicker when compared with XML or JSON (Text format). This efficiency makes it ideal for scenarios where reduced bandwidth usage and optimized storage are paramount.

How to work with Protocol Buffers?

Use case: There are two Java services namely Product and Cart. Cart service calls the API of the Product service to get the product details. This flow of data from Product to Cart can be done utilising protobuf as below -

The first step is to define the messages and services in a file with .proto extension. It is common to maintain all the proto files together in a separate repository and then import the proto repo in the service repos.

syntax = "proto3";
package productData;

service ProductService {
  // returns Product by ID
  rpc GetProduct (GetProductRequest) returns (GetProductResponse);
}

message GetProductRequest {
  int64 productId = 1;
}

message GetProductResponse {
  Product product = 1;
}

message Product {
  int64 id = 1;
  string name = 2;
  string description = 3;
  double price = 4;
}

The first two lines declare the syntax version being used and the package in which the compiled code will be generated.
Next, the service request and response types are defined.
Followed by messages for request and response with fields productId and product respectively.
Next, the message Product, defined with 4 fields (id, name, description and price) with data-type (int64, string, double) is defined with some numbers. These numbers are used to identify fields in the binary encoded data, which means they can’t change from version to version of your service and hence provide backward and forward compatibility.
Clients and services will ignore field numbers that they don’t know about, as long as the possibility of missing values is handled.

It is followed by compiling with the protoc compiler to generate code in the required language in respective service repositories.

The proto compiler is invoked at build time on .proto files to generate code in various programming languages to manipulate the corresponding protocol buffer. The protoc compiler is an open source code generator provided by google with multiple language support.

Product service fetches the data from a DB and serializes it as below :

Product prod = Product.newBuilder()
    .setId(id) //product id from DB
    .setName(name) //product name from DB
    .setDescription(description) //product description from DB
    .setPrice(price) //product price from DB
    .build();

prod.writeTo(outputStream); //any outputstream used to transfer data

Cart service receives the data and deserializes it as below :

GetProductResponse.Builder builder= GetProductResponse.newBuilder();
GetProductResponse responseObj = builder.mergeFrom(inpuStream).build();
//any inputstream used to receive data

Product product = responseObj.getProduct();

To understand the process of serialization/deserialization you can refer to: https://protobuf.dev/programming-guides/encoding

What are the benefits of Protocol Buffers?

The following are the major benefits of using Protobufs :

Efficiency and Compactness: The binary format used makes protobufs highly optimized and compact, resulting in reduced network bandwidth and faster serialization/deserialization compared to XML or JSON.

Language Interoperability: Protobuf provides compiler support for generating code in almost every programming language (except R) allowing seamless integration with existing codebase.

Versioning and Compatibility: Protobufs are backward compatible thereby addressing the issues of versioning.

When are Protocol Buffers, not a Good Fit?

Protocol buffers do not fit in all use cases like:

Protocol Buffers serialize the data in binary format and hence cannot be used where data is intended for human readability and is fed directly like in web browsers.
Protocol Buffers when serialized, can have many different binary encodings for the same data and therefore two messages cannot be compared for equality without fully parsing them.
Protocol buffers are not well supported in non-object-oriented languages such as Fortran which are popular in scientific computing.

How Protocol buffers are different from JSON?

Where are Protocol Buffers used?

Protobufs are widely used in various domains, including:

Communication protocols between microservices or client-server systems.
Storing and exchanging data in distributed systems.
Logging and auditing large-scale applications.
Data transmission in resource-constrained environments like IoT devices.

Conclusion

We can conclude that this binary blob of data format outperforms text-based formats like XML or JSON for interservice communications, resulting in faster serialization and deserialization and reducing network bandwidth usage. The support for versioning and backward compatibility makes it well-suited for large-scale applications where data schemas evolve over time. By adopting protobuf, you can future-proof your applications, reduce network bandwidth, and streamline your development process. So why not give it a try and unlock the full potential of protobuf in your projects? Happy coding!

PS : If you find yourself captivated by this binary blob serialization format you can discover more about it from its official documentation or keep an eye out for my next blog, where I will delve into even more details about this intriguing serialization format. Stay tuned for an enriching and enlightening read! Until then, take care and see you soon!