Introduction to protobuf in python

Kathy Tang

Published in

DeepQ Research Engineering Blog

3 min readDec 16, 2019

Introduction

Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data
currently support generated code in multiple languages, e.g. Java, Python, Objective-C, and C++…
Similar to xml and json, but smaller, faster and simpler
protobuf message is serialized and size efficient

How protobuf message is serialized

Example:

message  Test { 
   optional int32 price = 1;
}
// 08 96 01 => 0000 0100 1001 0110 0000 0001

Key:

(field_number << 3) | wire_type

field_number = 1, wire_type = 0 => key = 08

value = 96 01 = 150

Size of protobuf message compared with xml and json

xml( about 36 bytes)

<some><name>price</name><value>150</value></some>

json(about 11 bytes)

{price:150}

Install

brew install protobuf

Steps

Define message formats in a .proto file.
Use the protocol buffer compiler.
Use the Python protocol buffer API to write and read messages.

Example

addressbook.proto: define the format of data

Data is structured as messages, where each message is a small logical record of information containing a series of name-value pairs called fields.
data type: bool, int32, float, double, string….
the number of this field appears
1. required: a well-formed message must have exactly one of this field.
2. optional: a well-formed message have exactly zero or one of this field.
3. repeated: this field can be repeated any number of times (including zero) in a well-formed message.

syntax = "proto2"; package tutorial; message Person {   required string name = 1;    required int32 id = 2;   optional string email = 3;
   enum PhoneType {       MOBILE = 0;       HOME = 1;       WORK = 2;   }   message PhoneNumber {      required string number = 1;      optional PhoneType type = 2 [default = HOME];   }  repeated PhoneNumber phones = 4;  // multiple PhoneNumber data}
message AddressBook {  repeated Person people = 1; // multiple Person data}

cmd:

protoc --python_out=. addressbook.proto// use the protocol buffer compiler protoc to generate data access classes in your preferred language(s) from your proto definition.

generate : addressbook_pb2.py

These classes in addressbook_pb2.py can be used in your application to populate, serialize, and retrieve protocal buffer message.

write.py

import addressbook_pb2import sysmy_pb_file = "my_addr_book.pb"address_book = addressbook_pb2.AddressBook()
peopleIDs = [1, 2]peopleName = ["Selina", "Hebe"]peopleEmail = ["selina@gmail.com", "hebe@gmail.com"]for i in range(len(peopleIDs)):    person = address_book.people.add()    person.id = peopleIDs[i]    person.name = peopleName[i]    person.email = peopleEmail[i]with open(my_pb_file, "wb") as f:    f.write(address_book.SerializeToString())    # (See Complete API for Message for more information)

generate my_addr_book.pb

read.py

import addressbook_pb2import sys
my_pb_file = "my_addr_book.pb"address_book = addressbook_pb2.AddressBook()with open(my_pb_file, "rb") as f:    address_book.ParseFromString(f.read())print(address_book)

Difference between proto2 & proto3

Proto3 :

simplifies the protocol buffer language (e.g. removal of required fields and default values…)
make it available in a wider range of programming languages (e.g.Go、Ruby、JavaNano)