Build custom protocol on top of TCP with Node.js, part 1

The beginning

Nikolay Stoykov
6 min readNov 10, 2017

Who is this intended for?

People who are familiar with JavaScript and Node.js. Newbies in network programming who have tried to create a networking application (i.e. chat room), but hit a wall along their path. Want to build more understanding about what and why in networking.

What is inside?

Sounds like a box with candies, huh? Not really!

I have prepared 3 different approaches for message formats, and by message I do not mean a chat message, but rather a packet of bytes. Each part is accompanied by examples and code snippets.

As I am also an article reader and I often read on my phone, and I do not like to jump to github for reading code. That’s why, the code in this article is delivered as Github gists and as part of the content.

When I first started to experiment with Node.js, I wanted to get my hands dirty with TCP sockets. So I encountered several problems I want to share with you.

What you see is NOT what you get…

Let’s suppose you have the following server code:

//server.jsconst net = require('net');
const server = net.createServer();
server.on('connection', socket => {
console.log('new client arrived');
socket.write(Buffer.from('Hello World'));
});

The code for the server should be pretty straightforward and self-explanatory.

Then the client code looks the following:

// client.jsconst socket = net.createConnection({port: 8000, host: 'localhost' });
socket.on('connect', () => {
socket.on('data', data => {
console.log(data.toString('utf8'));
});
});

What you expect is that on data event you will receive the whole “Hello World” message or in other means 11 bytes, but sometimes it might happen that you will get the whole message in 2 “data” events. Don’t get me wrong — TCP will make sure that every peace of data is received in order. The fragmentation of the data is due to the streaming nature of TCP.

In some cases the “socket.write” invocations might come as one data event. This is called TCP coalescing. This happens due to Nagle’s algorithm. Generally it is there to protect misbehaving applications from congesting the network with many small packets. Because every TCP packet has an overhead of 40 bytes header, if the application is sending only 1 byte of valuable information every second you can imagine what can happen. I won’t go into more detail about Nagle’s algorithm. I find this article interesting on the subject.

1st — Simplest solution

If you are familiar with Web sockets and have written client-server using this, you will know that they have the concept of messages. The web socket standard, defines it in very details. If you feel okay with reading RFCs (or it will be a great way to start), I strongly encourage you to read the web socket standard.

So, lets tackle with our problem. First we’ll need to define our own protocol on top of TCP and define what a message is. Let’s assume that we want to transmit plain text over the network. We need something to denote the end of a message. For example, we can use “\n”(newline) character for message separator.

// client.jsconst net = require('net');let buffered = '';
const socket = net.createConnection({ port: 8000, host: 'localhost' });
socket.on('connect', () => {
socket.on('data', data => {
console.log(data.length);
buffered += data;
processReceived();
});
});
function processReceived() {
var received = buffered.split('\n');
while (received.length > 1) {
console.log(received[0]);
buffered = received.slice(1).join('\n');
received = buffered.split('\n');
}
}

Because our message contains only plain text, for every newly received data, we concatenate what we received. The next step is to process what we have received so far, so we split on “\n” character and display the message.

// server.jsconst net = require('net');
server.listen(8000);
server.on('connection', socket => {
console.log('new client arrived');

socket.write('Hello World\n');
socket.write('How are you Jack?\nI am fine, thanks a lot!\n');
socket.write('And you are?\n');
socket.write('Just perfect\n');
socket.write('------------------------');

setTimeout(() => {
socket.write('\n');
socket.write('Hello World\n');
socket.write('How are you Jack\nI am fine, thanks a lot!\n');
socket.write('And you are?\n');
socket.write('Just perfect!\n');
}, 5000);
});

The server’s purpose is to only send data to the client and show how the client processes it.

The next figure shows the output on the client-side application. As you can see the client received 2 packets — one containing 112 bytes and the other 83 bytes of data. If you look closely to the server.js code, you will see that the long dashed line is sent before the setTimeout, but is shown as it is received after the timeout has fired. This is because the client is expecting to see an end of a message, which is not yet sent.

Client side output

Real world test

Now if you stop for a moment and think, you’ll notice that the above solution will not work in case you want your messages to contain new lines.

2nd — Another Approach (possibly wrong)

Another way to overcome this is to use longer sequence of unique characters. You may use some kind of random generator to create this sequence. This approach has several drawbacks:

  • You need to send the sequence to the other side (client or server).
  • The longer the sequence, the longer the message size — this leads to more bytes over the network
  • If someone accidentally sends the same sequence of bytes, this will lead to malformed messages.
  • etc…

3rd — Acceptable Approach

We thought long and hard and we finally came out with an award-winning format to send messages over the network.

We’ll separate each message in 2 segments — header and actual payload. The header will always be 2 bytes in size and will contain 1 unsigned integer. This integer will show how many bytes the payload contains. The quotes around Hello World are just to show that it is a string, but you can send whatever bytes you want.

        ------------------------------------------------------------
Name | HEADER | PAYLOAD |
Size | 2 Bytes | 12 Bytes |
Content | 12 | "Hello World\n" |
------------------------------------------------------------

It is important to note the way numbers can be stored. As we are using Node.js for client and server implementation — the built-in Buffer module will come to help. We’ll have to decide in what Endianness to write the integer. The order in which you write the integer, have to be the order in which you read it. Buffer module has a number of convenient methods for reading and writing integers in different orders and sizes.

An example from the Node.js documentation:

const buf = Buffer.allocUnsafe(4);

buf.writeUInt16BE(0xdead, 0);
buf.writeUInt16BE(0xbeef, 2);

// Prints: <Buffer de ad be ef>
console.log(buf);

buf.writeUInt16LE(0xdead, 0);
buf.writeUInt16LE(0xbeef, 2);

// Prints: <Buffer ad de ef be>
console.log(buf);

The following code contains the class Networker which will take care of reading the header and the actual payload of the packets. For every received data it checks whether there is enough bytes to read. So, first it tries to read 2 bytes for the header and then proceeds to the payload.

I suggest you to first start with _onData method on line 89. Then go to the method _readBytes.

Next is the server.js and client.js code:

They both take advantage of the previously shown networker.js to send and receive messages. In later articles we’ll extend client’s code to be able to send arbitrary messages from the console.

To go deeper in this challenge you can try to define some structure for the payload. For example if you want to send an object, how can you encode it in the payload, and so on.

Follow up

In the following articles I will try to show and explain you how to create a simple chat application using the above techniques. Also we’ll go into the details of how to send files not only plain text messages.

--

--

Nikolay Stoykov

Using JavaScript as a 2nd speaking language, fascinated by networking, micro-service architectures and algorithms.