gRPC Explained: Part 2 Protobuf

Ankit Dwivedi
7 min readOct 8, 2023

--

In the previous blog we got a comprehensive introduction of gRPC, in this installment we will cover Protocol Buffer, a.k.a. Protobuf.

protobuf logo

The name “Protocol Buffers” has a unique origin.
In its early days, it referred to a class called “ProtocolBuffer,” acting as a buffer for individual method calls. Users could add tag/value pairs to this buffer, and the raw bytes were stored until they were written out after constructing the message. Although the “buffers” part of the name lost its original meaning, it endured. Today, we commonly use “protocol message” to refer to a message in an abstract sense, “protocol buffer” for a serialized message, and “protocol message object” for an in-memory representation of the parsed message.

What are Protocol Buffers?

Protocol Buffers is a simple language-neutral and platform-neutral Interface Definition Language (IDL) for defining data structure schemas and programming interfaces. It supports both binary and text wire formats, and works with many different wire protocols on different platforms.

For example have a look at this simple proto file (person.proto) that defines a message called ‘Person.’ This message describes the properties of a person, name, id, and an optional email address.

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}

This ‘person.proto file’ serves as a contract between the server and the client. If you ever want to change how this ‘Person’ entity is structured or how your requests and responses should look you need to modify the proto file.

The Protobuf compiler, protoc, is maintained by Google, although alternative implementations are available. The generated code is efficient and optimized for fast serialization and deserialization of data.

Why Choose Protocol Buffers (Protobuf) Over JSON?

Proto vs JSON

You might be wondering when JSON, a widely-used serialization format, already exists why we should go with Protocol Buffers (Protobuf)? .

Let’s delve into why Protobuf is a fantastic choice and how it compares to JSON in solving common data serialization challenges:

Comparison of proto vs json in a table. (why in a table? because that makes it easier to make comparisons)

In summary, Protobuf and JSON each have their own unique strengths, Protobuf shines in situations where efficiency, cross-platform compatibility, and structured data are crucial.

JSON, on the other hand, is still an excellent option when you need human-readable data or when the simplicity of a lightweight format is more suitable.

Protobuf Syntax

This quick intro provides you with a taste of Protobuf’s syntax and core concepts. If you want to explore more I encourage you to check out the official Protocol Buffers documentation.

Messages: The Data Blueprints

Think of Protobuf messages as the blueprints for your data structures. They tell you how your data should be organized.

message Recipe {
string dish_name = 1;
repeated string ingredients = 2;
double preparation_time_minutes = 3;
}

In this example, we’ve created a Recipe message with three fields: dish_name for the name of the dish, ingredients for the list of ingredients (you can have as many as you want), and preparation_time_minutes for how long it takes to make the dish. Each field has a unique number (e.g., 1, 2, 3) for organization.

Field Types

Protobuf supports various field types, such as strings, integers, floats, enums, and more. You can even nest messages within other messages to create complex data structures. These field types make sure your data is well-structured and correctly typed.

Field Labels

Fields in messages can have labels that determine whether they are required, optional, or repeated (for lists):

  • Required Fields: These fields must always be present in a message of this type. If a required field is missing when you serialize the message, it will result in an error.
  • Optional Fields: They can be included in a message, but they are not required. If you omit an optional field when serializing a message, it’s treated as if it has a default value.
  • Repeated Fields: Repeated fields allow you to have multiple values of the same type in a single field. They are used for lists or arrays of data.

Enums

Enums allow you to define a set of named constant values. They’re handy when you have a field with a predefined set of options, such as days of the week or product categories.

enum DayOfWeek {
MONDAY = 1;
TUESDAY = 2;
// ...
}

Comments

You can include comments in your Protobuf definitions to explain your messages and fields better. Comments can start with // or be wrapped in /* ... */.

Syntax Versions: Rules and Features

Protobuf offers different syntax versions, with proto2 and proto3 being the most common. These versions define the rules and features you can use in your Protobuf definitions.

Note: It is recommended that gRPC APIs should use Protocol Buffers version 3 (proto3) for API definition

Importing Other Files: Staying Organized

For bigger projects, you can split your Protobuf definitions into multiple files and bring them together using the import statement.

Serialization and Deserialization

The Protobuf wire format is a binary encoding hence faster to process. It uses some clever tricks to minimize the number of bytes used to represent messages. Knowledge of the binary encoding format isn’t necessary to use Protobuf.

To truly grasp the power of Protocol Buffers (Protobuf), let’s walk through an example of how data is serialized and encoded, and then subsequently decoded back.

Consider the following data :
(we are using person.proto defined in previous section)

{
"name": "Ankit",
"id": 21,
"email": "username@gmail.com"
}

Serialization and Encoding

Protobuf takes this JSON data and converts it into a binary format that’s both efficient and space-saving. In this case, the Protobuf encoding looks like this:

0a 05 41 6e 6b 69 74 10 15 1a 12 75 73 65 72 6e 61 6d 65 40 67 6d 61 69 6c 2e 63 6f 6d

Decoding

Now, let’s reverse the process and decode this Protobuf data back into its original form:

This decoding process is what makes Protobuf so efficient and powerful. It ensures that data remains consistent and structured, even after encoding and decoding, making it a preferred choice for data transmission in various scenarios.
This was just a simple example if you’re interested, you can learn more about it on the Protocol Buffers website.

Protocol Buffers in gRPC

Protocol Buffers (Protobuf) are vital in gRPC, providing efficient and consistent communication between clients and servers. Here’s why they’re crucial:

  • API Contract Definition: Protobuf defines message structures for gRPC, ensuring efficient and error-free data transmission.
  • Efficient Serialization: Protobuf’s binary format speeds up data serialization and deserialization, enhancing gRPC’s performance.
  • Language Neutrality: Protobuf’s language-agnostic nature enables seamless integration across various programming languages.
  • Efficiency: Protobuf’s binary format reduces network usage, making data transmission faster.
  • Interoperability: Protobuf acts as a universal translator, enabling gRPC services to communicate effortlessly across languages and platforms.
  • Backward Compatibility: Protobuf’s versioning support allows API evolution without breaking existing clients.
  • Code Generation: Protobuf simplifies message structure code generation, streamlining development.
  • Performance: Efficient serialization and deserialization by Protobuf boost gRPC services’ overall performance.

Let’s include an example of a gRPC service defined using Protocol Buffers (Protobuf). Imagine we’re building a chat application with user authentication.

Here’s a Protobuf definition for our service:

syntax = "proto3";
message User {
string id = 1;
string username = 2;
}
message Message {
string id = 1;
string text = 2;
User sender = 3;
}
service ChatService {
rpc SendMessage(Message) returns (Message);
rpc GetMessages(User) returns (stream Message);
}

In this example, we define two message types, User and Message, and a ChatService that allows sending and receiving messages. With Protobuf, this service definition is clear, concise, and can be easily generated into code for various programming languages.

In conclusion, Protocol Buffers (Protobuf) have revolutionized the way data is serialized, transmitted, and understood across diverse systems. Their efficiency, cross-platform compatibility, and structured data handling make them a powerful choice for modern applications.

By understanding Protobuf’s syntax, core concepts, and its role in gRPC, you’re well-equipped to harness its capabilities. As you explore the world of Protobuf, remember that the official Protocol Buffers documentation is your comprehensive guide for diving deeper into this technology

Connect with me on LinkedIn. If you have any questions or topics you’d like me to cover, please feel free to reach out.

References:

  1. https://protobuf.dev/
  2. https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
  3. https://auth0.com/blog/beating-json-performance-with-protobuf/

--

--

Ankit Dwivedi

Engineer at Stripe | ex- Google | Building largescale systems