Introduction to protobuf in python

Kathy Tang
DeepQ Research Engineering Blog
3 min readDec 16, 2019

Introduction

  • Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data
  • currently support generated code in multiple languages, e.g. Java, Python, Objective-C, and C++…
  • Similar to xml and json, but smaller, faster and simpler
  • protobuf message is serialized and size efficient

How protobuf message is serialized

Example:

message  Test { 
optional int32 price = 1;
}
// 08 96 01 => 0000 0100 1001 0110 0000 0001

Key:

(field_number << 3) | wire_type 

field_number = 1, wire_type = 0 => key = 08

value = 96 01 = 150

Size of protobuf message compared with xml and json

xml( about 36 bytes)

<some><name>price</name><value>150</value></some>

json(about 11 bytes)

{price:150}

Install

brew install protobuf

Steps

  • Define message formats in a .proto file.
  • Use the protocol buffer compiler.
  • Use the Python protocol buffer API to write and read messages.

Example

addressbook.proto: define the format of data

  • Data is structured as messages, where each message is a small logical record of information containing a series of name-value pairs called fields.
  • data type: bool, int32, float, double, string….
  • the number of this field appears
    1. required:
    a well-formed message must have exactly one of this field.
    2. optional:
    a well-formed message have exactly zero or one of this field.
    3. repeated:
    this field can be repeated any number of times (including zero) in a well-formed message.
syntax = "proto2"; package tutorial; message Person {   required string name = 1;    required int32 id = 2;   optional string email = 3;
enum PhoneType { MOBILE = 0; HOME = 1; WORK = 2; } message PhoneNumber { required string number = 1; optional PhoneType type = 2 [default = HOME]; } repeated PhoneNumber phones = 4; // multiple PhoneNumber data}
message AddressBook { repeated Person people = 1; // multiple Person data}

cmd:

protoc --python_out=. addressbook.proto// use the protocol buffer compiler protoc to generate data access classes in your preferred language(s) from your proto definition.

generate : addressbook_pb2.py

  • These classes in addressbook_pb2.py can be used in your application to populate, serialize, and retrieve protocal buffer message.

write.py

import addressbook_pb2import sysmy_pb_file = "my_addr_book.pb"address_book = addressbook_pb2.AddressBook()
peopleIDs = [1, 2]peopleName = ["Selina", "Hebe"]peopleEmail = ["selina@gmail.com", "hebe@gmail.com"]for i in range(len(peopleIDs)): person = address_book.people.add() person.id = peopleIDs[i] person.name = peopleName[i] person.email = peopleEmail[i]with open(my_pb_file, "wb") as f: f.write(address_book.SerializeToString()) # (See Complete API for Message for more information)

generate my_addr_book.pb

read.py

import addressbook_pb2import sys
my_pb_file = "my_addr_book.pb"address_book = addressbook_pb2.AddressBook()with open(my_pb_file, "rb") as f: address_book.ParseFromString(f.read())print(address_book)

Difference between proto2 & proto3

Proto3 :

  • simplifies the protocol buffer language (e.g. removal of required fields and default values…)
  • make it available in a wider range of programming languages (e.g.Go、Ruby、JavaNano)

Reference

  1. https://developers.google.com/protocol-buffers/
  2. https://neighborhood999.github.io/2018/07/25/protocol-buffers-intro/
  3. https://blog.gtwang.org/programming/python-protocol-buffers-tutorial/
  4. https://grpc.io/docs/guides/
  5. https://solicomo.com/network-dev/protobuf-proto3-vs-proto2.html
  6. https://www.ibm.com/developerworks/cn/linux/l-cn-gpb/index.html
  7. https://www.cnblogs.com/hustdc/p/9131322.html

--

--