C++/Java socket communication easily? Let’s try Protobuf!

Dmytro Chaban
7 min readJul 3, 2017

--

As just like these two different banks need a bridge, different programming languages need good communication to help with problems quickly and fancily

Well, actually not only C++ with Java, but also Python, Objective-C, C#, JavaScript, Ruby, Go, PHP and even Dart. This is whole list of languages that you can easily handle communication with using just one command and one file. I’m speaking about protobuf or Protocol Buffers. Protobuf is similar to Json or Xml: you can serialize and deserialize data from one machine to another. But it has a lot of advantages.

The first one is speed, and if you have ever written messaging applications or streaming you know that every bit has a price. Consider what would it be like if every streaming protocol was defined in Json or (God forbid) xml. It’s not just stupid, it’s very costly in terms of communication. Even though this kind of technology already exists (think HLS where every package for streaming is sent via http with a lot of unused information from headers etc.), sometimes the size of sending data is very restricted for you. What can you do about it? Write your own transport on tcp/udp? Of course. But it means a lot of repetitive work. Protobuf can handle it for you, you just have to send them.

That’s what you look like when you write your own transport for each language with your own serialization

Also, notice that you’re not restricted only to udp/tcp with protobuf. Protobuf converts your data to bytes, everything else is up to you. Protobuf if faster than xml and json because it converts data to bytes, not to text. It’s up to 10 times smaller than xml and up to 100 times faster (depends on the ways you can send them).

Second is simplicity and usability. You define your protocol like in Java or any other C-like language. Protobuf uses C-styled code structure so almost any developer can understand it. But protobuf also handles a lot of dirty work for you: (De)serialization, enum translation(from int to enum and back), generation to string, debugging strings, writing serialized data to a lot of sources(streams, buffers) and also deserialization from them. And that’s the main topic of our article. Do you remember what a pain in the ass it was to send some data between applications in different languages? For example, consider this mobile game structure: Java server, C++ game client, Java wrapper client(Android). Additionally there can be C# unity client or Swift/Obj-C IOS client.

Are you already feeling this? Do you feel that you have to define your model classes, define serialization/deserialization(even if via json, but it’s not the place for realtime gaming), translation from serialized data to model (filling) and the reverse process. Even if you optimize some parts, you still have to do the same job 3x times or more. This is the place where protobuf starts to shine. Protobuf saves you from all these steps! Really. If you don’t trust me, trust WhisperSystems that’s using protobuf in Signal messenger.

Of course, we’re not going to write down this whole structure. But I want to show you the basics: Java Server, Java and C++ clients. This will give you the general idea of what’s going on here. So, our server will handle request sequentially (no threads up here to make it simpler) and clients are request-receive-die programs, they’re requesting data(save, retrieve, delete) and finish right after receiving data from the server.

Our simple communication structure

Our client will send “Note” to server, server will keep it in a list, client can also retrieve all “notes” on the server’s list. No security, no threads, no complexity of course. Let’s start from protocol definition: we have a “Note” model. We need to create new file named *.proto. It’s represents protobuf format.

As I said, you can already see some similar code there:

  • message — represents model on steroids: there you can find builders, getters and setters, hash functions, string generation and other goodies so that you don’t have to do yourself.
  • int64 — integer type, int64 represents long in java and int64 in C++.
  • string — just as you might have guessed, it’s a string.
  • enum — this is real enum as in Java. You can generate enum classes inside messages, but I preferred to put it outside. Actually I thought to delete this but let’s keep it for this “lesson”. As you can see we can include enum types inside of messages as variables.

I think everything is understandable here if you are familiar with one of the major languages. That’s it, well.. not really. Also notice that you can arrange items as you wish. It can be helpful when you just want read a few bytes from arrived message(e.g id), every variable get’s own number in arrangement. As it’s protocol, we have to define something that will rout our requests, named “Envelope”. Envelope just contains enum (or int). Based on it, we handle different data from envelope.

Nothing super new:

  • enum — included enum inside of Envelope message. Actually there’s a small difference in naming of enums in C++, you’ll see it later.
  • repeated — represents a list of data. In our case list of notes. This means that you can send not only one Note, but a lot of them.

Now that’s it! Really. Last part is put additional data to generate sources for our applications in the beginning of our .proto file.

syntax = "proto3";

package protocol;

option java_package = "com.protobuf.example";
option java_outer_classname = "NotesProtocol";

You can see whole proto file on my Github.

It’s time to generate, let our monster come to life! Code generation really simple. It’s as simple as building your hello world program:

protoc -I=PROTO_FILE_FOLDER  --java_out=JAVA_SRC_FOLDER --cpp_out=CPP_SRC_FOLDER  INCLUDED_PROTO_FILES

PROTO_FILE_FOLDER is a folder where all .proto files are situated. JAVA_SOURCE_FOLDER is a path to your java /src folder. Also with CPP_SRC_FOLDER. INCLUDED_PROTO_FILES are .proto files that you want to generate sources from. If you’re trying to generate sources from my project, go to the protobuf folder with generate_sources.sh file and run this command:

sh ./generate_sources.sh ..

This will compile and put sources down the folder into cpp_client and java_client.

Here I’m not going to deep dive into project structure, as always, you can see whole project sources on github. Here I’m going to show you just how easy it is to prepare data for sending via protobuf. Let’s start with the client. Let’s assume that we already connected to socket and have OutputStream in the scope. Here’s how we will create and send data.

Here we created two objects via beautiful builder. That’s all, we get our bytes, we can do whatever we want with them. Though, it’s not the last way to convert object, you can also convert it to ByteString, but we’re not going to speak about it here. You can also just write object to some source, e.g our OutputStream.

On the servers side, we have a challenge reading from InputStream. And it’s… easy!

ByteBuffer buf = ByteBuffer.allocate(1024);
int numBytesRead = client.read(buf);
buf.flip();
Envelope envelope = Envelope.parseFrom(buf);//Here we go.

As because I’m using nio, it’s useful for me to parse from ByteBuffer, but .parseFrom() can do a lot of dirty work for you, and if more precisely:

  • byte[] — plain old bytes, just put them into and get your Envelope.
  • ByteString — the same ByteString that we’re not diving into.
  • InputStream and CodedInputStream — just put stream from source(socket for example) and get your object.

As you understand, the higher level abstraction you put, the less control you have. So I use ByteBuffer for this purposes, you always can do whatever you want with this, and also can control what you’re retrieving (e.g I can skip first 10 bytes etc).

That’s all! Really. In Java is that simple. But here we’re entering the Shoot Yourself in the Foot kingdom. First time it took me 10 hours in summary to just compile a damn protocol and include it into the project. No more words, let’s start. But before, let’s read some rules of this kingdom:

  1. Know what your “sword” is. When you try to compile, check what protoc version you’re using. For example, 2.5.0 and 2.6.0 are different version and some things won’t compile. 🤷🏼‍
  2. Use protoc — version to check your version. It has to be the same version you used in generated sources.
  3. Put your -lprotobuf -pthread at the end of compile options. This was my mistake here.
  4. Use g++ compiler.

Well, I guess we can finally start. I’m using mongoose.c for usability. Actually you won’t see significant differences. Here’s generation of object:

And here’s retrieving:

Envelope receivedEnvelope = Envelope();
receivedEnvelope.ParseFromString((&conn->recv_mbuf)->buf);

As you can see, retrieving is very simple, you just take buffer and put it into Envelope object via .ParseFromString(), that’s all.

And final question, how to compile this one? It depends on the platform, I come up with this line for this project(you can find it /cpp_client/build_sources.sh ):

g++ -I /usr/local/Cellar/protobuf/3.3.0/include -L /usr/local/lib [source files with generated protocol sources] -D_GLIBCXX_USE_CXX11_ABI=0 -o [APP_FILE_NAME] -lprotobuf -pthread

In square brackets you put all .c files from your project with generated .pb.cc files from protocol. After that you’ll find the application with your name in the same folder.

That’s all, you now have basic knowledge to start writing some simple protocols via protobuf.

--

--

Dmytro Chaban

Software Engineer, addicted to productivity and automatization