Introduction to protobuf in python
Published in
3 min readDec 16, 2019
Introduction
- Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data
- currently support generated code in multiple languages, e.g. Java, Python, Objective-C, and C++…
- Similar to xml and json, but smaller, faster and simpler
- protobuf message is serialized and size efficient
How protobuf message is serialized
Example:
message Test {
optional int32 price = 1;
}
// 08 96 01 => 0000 0100 1001 0110 0000 0001
Key:
(field_number << 3) | wire_type
field_number = 1, wire_type = 0 => key = 08
value = 96 01 = 150
Size of protobuf message compared with xml and json
xml( about 36 bytes)
<some><name>price</name><value>150</value></some>
json(about 11 bytes)
{price:150}
Install
brew install protobuf
Steps
- Define message formats in a
.proto
file. - Use the protocol buffer compiler.
- Use the Python protocol buffer API to write and read messages.
Example
addressbook.proto: define the format of data
- Data is structured as messages, where each message is a small logical record of information containing a series of name-value pairs called fields.
- data type: bool, int32, float, double, string….
- the number of this field appears
1. required: a well-formed message must have exactly one of this field.
2. optional: a well-formed message have exactly zero or one of this field.
3. repeated: this field can be repeated any number of times (including zero) in a well-formed message.
syntax = "proto2"; package tutorial; message Person { required string name = 1; required int32 id = 2; optional string email = 3;
enum PhoneType { MOBILE = 0; HOME = 1; WORK = 2; } message PhoneNumber { required string number = 1; optional PhoneType type = 2 [default = HOME]; } repeated PhoneNumber phones = 4; // multiple PhoneNumber data}
message AddressBook { repeated Person people = 1; // multiple Person data}
cmd:
protoc --python_out=. addressbook.proto// use the protocol buffer compiler protoc to generate data access classes in your preferred language(s) from your proto definition.
generate : addressbook_pb2.py
- These classes in addressbook_pb2.py can be used in your application to populate, serialize, and retrieve protocal buffer message.
write.py
import addressbook_pb2import sysmy_pb_file = "my_addr_book.pb"address_book = addressbook_pb2.AddressBook()
peopleIDs = [1, 2]peopleName = ["Selina", "Hebe"]peopleEmail = ["selina@gmail.com", "hebe@gmail.com"]for i in range(len(peopleIDs)): person = address_book.people.add() person.id = peopleIDs[i] person.name = peopleName[i] person.email = peopleEmail[i]with open(my_pb_file, "wb") as f: f.write(address_book.SerializeToString()) # (See Complete API for Message for more information)
generate my_addr_book.pb
read.py
import addressbook_pb2import sys
my_pb_file = "my_addr_book.pb"address_book = addressbook_pb2.AddressBook()with open(my_pb_file, "rb") as f: address_book.ParseFromString(f.read())print(address_book)
Difference between proto2 & proto3
Proto3 :
- simplifies the protocol buffer language (e.g. removal of required fields and default values…)
- make it available in a wider range of programming languages (e.g.Go、Ruby、JavaNano)
Reference
- https://developers.google.com/protocol-buffers/
- https://neighborhood999.github.io/2018/07/25/protocol-buffers-intro/
- https://blog.gtwang.org/programming/python-protocol-buffers-tutorial/
- https://grpc.io/docs/guides/
- https://solicomo.com/network-dev/protobuf-proto3-vs-proto2.html
- https://www.ibm.com/developerworks/cn/linux/l-cn-gpb/index.html
- https://www.cnblogs.com/hustdc/p/9131322.html