How to build a network stack in Ruby

Learning about sockets, datagrams, bit-twiddling and more — all from the comfort of a high-level language

Like many young programmers I began my career without a degree in Computer Science or Software Engineering. The recent influx of approachable resources has made it easier than ever to start making apps and getting paid. 💸

I think this is fantastic. However it can foster a feeling of inferiority or imposter syndrome amongst self-taught developers. We the “Software Writers” stand on the shoulders of giants; assembling LEGO® blocks into grand structures, while the “Real Programmers” figure out the hard problems, creating operating systems and designing protocols, things we feel are just beyond our grasp.

My goal is to challenge this mindset and encourage you to be a little kinder to yourself. If you can grok Bundler and NPM, then I’m confident that low-level network programming will be easier than you might think, and actually a lot of fun!

Our challenge

Today we’re going to implement UDP using sockets. We’re going to read raw ethernet frames, IP packets and UDP datagrams and use bitwise operations to tease data out.

⚠️ I just used a whole lot of new terms! Don’t worry if you haven’t come across them before, all will become clear in due course.

This article was inspired by Gary Bernhardt’s tweet.

All code in this article has been tested on 64-bit Linux and is available on GitHub.

What is UDP?

UDP is one of the simplest protocols available for exchanging messages between computers on a network. The entire specification (RFC768) can be read in 5–10 minutes, which makes it a great protocol to get started with.

Unlike more sophisticated protocols such as TCP, there is no long-lived connection between the client and server, no handshake, and no guarantee that your message will ever actually reach the remote computer — this is why UDP is colloquially known as the “Unreliable Datagram Protocol”.

The protocol’s simplicity and apparent shortcomings suit certain types of application very well, for example: real-time software where the chance of loosing messages is preferable to the overhead of waiting for delayed packets.

One good example of this would be a telephone app like Skype; it wouldn’t make much sense to wait around for a delayed packet of something said at the start of the call, the conversation has probably moved on!

Some notable applications of UDP: DNS, DHCP, StatsD.

Getting started

Let’s start off by taking a look at a UDP server written using Ruby’s standard library implementation of the protocol.

Our server is fairly straightforward, it creates a UDP socket and binds it to an address (host and port), then loops forever waiting for messages. When a message is received, it reads at most 1024 bytes from it. Finally it calls String#upcase on the message, and sends it back to the client.

We can test our server with Netcat a handy tool when working with UDP and TCP servers.

$ echo "Hello World" | nc -u 192.168.33.10 4321
HELLO WORLD

Under-the-hood

Ruby’s UDPSocket class provides simple abstractions over a collection of functions in the Linux kernel (and other UNIX-like systems) that make up the Berkeley sockets API. For our purposes we’re concerned with the following 4:

  • socket()
  • bind()
  • recvfrom()
  • sendto()

The Ruby standard library also exposes a lower-level Socket class which is a closer-match of the underlying C functions.

# Ruby
Socket.open(:INET, :DGRAM)
/* C */
socket(PF_INET, SOCK_DGRAM, 0);

Socket to me!

All of this talk of “sockets” and we haven’t even stopped to explain what a “socket” actually is, let’s fix that shall we?

To use an analogy: a network socket is similar to the telephone socket on your wall; it connects your telephone to the outside world. The act of “binding” a network socket to an address (host and port) is similar to your telephone company assigning your house a phone number, when people dial your number, the switchboard directs their call to the socket in your house.

My filthy telephone socket!

The Linux kernel already has implementations of UDP and TCP, and you can use them by specifying the “type” of socket you’d like to create.

socket(PF_INET, SOCK_DGRAM, 0);
^ "SOCK_DGRAM" tells the kernel we want UDP

When interacting with sockets in C, calling the socket() function will return a “file descriptor” which is a special number used to identify a file. Unix-like systems are famous for treating lots of different things as if they were just files — this includes sockets.

Using our file descriptor we can call file-like functions such as read(), and the kernel will take care of unpacking the UDP datagram and returning the message within.

For our experiment we’re going to take care of this part ourselves, by using a “raw” socket instead.

Socket.open(:INET, :RAW, Socket::IPPROTO_UDP)

Actually we’re going to go deeper down the rabbit hole and ask for raw frames from the ethernet driver, which is as low-level as you can get without implementing the driver yourself too!

Socket.open(:PACKET, :RAW)

Binding the socket

⚠️ Heads up! Don’t worry if you have a hard time digesting this section. It’s more about the intricacies of C structure alignment than it is about network programming, it’s just a means to an end. Hang in there!

When opting-into the kernel’s implementation of UDP and TCP, binding is simple and can be done with a host and port.

Ports are actually part of the UDP and TCP protocols (not the lower-level IP protocol), as we’ll see when unpacking the UDP datagram, so to receive raw ethernet frames we need to bind our socket to a “network interface” (e.g. the WiFi card in your laptop) instead.

This is where things start to get hairy. To bind a socket to a network interface you need to know the interface’s “index” — we can find this out by performing a system call to a function called ioctl().

The ioctl() function can be used to perform a bunch of different operations on IO “devices”. Each operation is referred to by a numeric constant (e.g. the operation for ejecting a CD from a drive is CDROMEJECT), we’ll be using an operation called SIOCGIFINDEX to query the kernel for our interface’s index.

When performing the SIOCGIFINDEX operation ioctl() function expects the last argument to be a C structure called ifreq, this structure will both contain the argument for the operation (the interface name) and be used to return the value to the caller.

To demonstrate this concept using a Hash in Ruby:

request = { name: 'eth1', index: nil }
socket.ioctl(SIOCGIFINDEX, request)
request[:index] # => 0x3

Unfortunately, we can’t create instances of C structures in Ruby, but luckily it’s fairly straightforward to fake them! We can create a string of bytes where the bytes containing our data line up with where the fields in the C structure would be.

Phew! That was tricky, but we’re on the home stretch and ready to bind the socket. We’re going to construct another C structure, called sockaddr_ll this time.

Receiving data

Our socket is bound, the sun is shining and we’re ready to start receiving data. Let’s see what kind of data we get when a UDP packet is sent our way.

We’ll be making use of the excellent hexdump gem to inspect the data we receive.

$ echo hello | nc -u 192.168.33.10 4321

Quoth the terminal:

0a 00 27 00 00 00 08 00 27 d7 47 6c 08 00 45 c0  |..'.....'.Gl..E.|
00 3e e2 9e 00 00 40 01 d4 04 c0 a8 21 0a c0 a8 |.>....@.....!...|
21 01 03 03 c0 78 00 00 00 00 45 00 00 22 46 89 |!....x....E.."F.|
00 00 40 11 70 e6 c0 a8 21 01 c0 a8 21 0a ca 13 |..@.p...!...!...|
10 e1 00 0e 1d a5 68 65 6c 6c 6f 0a |......hello.|

Huzzah! We can see our message, so something must be working; but what’s all this other stuff? If you’re not intimately familiar with hexadecimal numbers, this wall of text might be intimidating. Let’s take a moment to find out what it’s all about.

Bits, bytes, hex — “Oh My!”

Binary is your laptop’s favourite language. In fact it’s the only language your computer really understands! Everything from the number “42” to your favourite Kanye West song is represented as a stream of ones and zeros.

There’s nothing magical about binary, it’s just a different way of representing numbers. Usually we represent numbers using the digits 0–9, this is called the “Decimal” numeral system. Binary only has the digits 0 & 1.

The number of different digits a numeral system has is called it’s “base” (or radix).

Decimal is “base 10” because it has 10 digits and binary is “base 2”.

This is how we represent different numbers in decimal and binary:

These two are easy because both numeral systems can represent them with a single digit:
Decimal 0 = Binary 0
Decimal 1 = Binary 1
We've run out of unique digits we can use in binary, so we must add a digit. This is similar to going from 9 to 10 in decimal.
Decimal 2 = Binary  10
Decimal 3 = Binary 11
Decimal 4 = Binary 100

Individual digits in a binary number are called “bits”. Bits aren’t much use on their own as they can only represent 2 distinct values, so we group them together to represent larger values. Groups of 4 bits are called “nibbles” and groups of 8 bits are called “bytes”. Larger groups of 16, 32 and 64 bits are called “words” or “double words” depending on the architecture of your CPU.

Writing long numbers out in binary is pretty tedious, and converting from decimal to binary is tricky. This is where “hexadecimal” numbers come in! Hexadecimal is base 16, which conveniently is the largest value you can store in a 4-bit nibble. Meaning that each digit in a hexadecimal number represents a nibble, making conversion from hex to binary simple!

Hexadecimal 0 = Binary 0
Hexadecimal F = Binary 1111
Hexadecimal 3 = Binary 0011
Hexadecimal F3 = Binary 1111 0011

Decoding an ethernet frame

That seemingly indecipherable hex dump from earlier is an “Ethernet Frame”, it’s the raw packet of data returned from your network interface’s driver.

There are a few different kinds of ethernet frame, but the kind we’ll be looking at today is “Ethernet II” also known as “DIX Ethernet” the most common form of frame.

https://commons.wikimedia.org/wiki/File:Ethernet_Type_II_Frame_format.svg

As the above diagram shows, the first 14 bytes contains the “MAC Header” comprised of source and destination MAC addresses and the “EtherType” field — which indicates the type of ethernet frame (in our case IPv4).

0a 00 27 00 00 00 08 00 27 d7 47 6c 08 00 45 c0  |..'.....'.Gl..E.|
00 3e e2 9e 00 00 40 01 d4 04 c0 a8 21 0a c0 a8 |.>....@.....!...|
21 01 03 03 c0 78 00 00 00 00 45 00 00 22 46 89 |!....x....E.."F.|
00 00 40 11 70 e6 c0 a8 21 01 c0 a8 21 0a ca 13 |..@.p...!...!...|
10 e1 00 0e 1d a5 68 65 6c 6c 6f 0a |......hello.|
0a 00 27 00 00 00 - Destination MAC Address
08 00 27 d7 47 6c - Source MAC Address
08 00 - EtherType (IPv4)

We can start extracting useful data by picking out the MAC addresses and formatting them in the conventional way (0A:00:27:00:00:00).

Next let’s extract the IPv4 packet.

IPv4 packet

The IPv4 packet starts with a 20 byte header including all kinds of interesting information.

https://en.wikipedia.org/wiki/IPv4

Many fields take up less than a single byte e.g. the “Version” field which occupies a 4-bit nibble. High-level languages tend to operate on data structures of a byte or larger, so working with anything smaller can require a little extra work.

Ruby provides a number of “bitwise” operators for working with the underlying bits in a byte or integer. Extracting the “Version” field is a case of reading the first 4 bits of the first byte, we can do this by “shifting” all of the bits to the right by 4.

Before:
[0100] 0101 (Decimal 69)
▲ ▲
| |
| +- IHL Field (want to discard)
|
+- Version (want to keep)
Shifting the bits 4 places to the right:
0100 (0101) <- These 4 bits get "dropped off" the end
▲▲▲▲
||||
|||+----+
||+----+|
|+----+||
+----+|||
||||
▼▼▼▼
0000 0100
After:
0000 [0100] (Decimal 4)

This is how it looks in Ruby:

For argument’s sake let’s say that we also want the value in the “IHL” field (the second 4 bits). To accomplish this we could use a binary AND to “mask over” this unwanted first 4 bits.

Binary AND takes 2 numbers, and makes another number which has 1s in positions where both of the original numbers had 1s, and 0s everywhere else.

    1111 1001
AND 0110 1111
---------
0110 1001

Here’s how we can get to the IHL field using binary AND:

First Byte:
1010 [0101]
^ IHL
Masking over the first 4 bits:
    1010 [0101]
AND 0000 1111 (Hexadecimal F)
-----------
0000 0101

Here’s how it would look in Ruby:

For our purposes the only field we need from the IP header is the “Source IP Address” as we will use this later to send our reply to the client.

UDP datagram

Finally we’re getting to the good stuff! After skipping over the 20 byte IPv4 header, we arrive at the UDP datagram.

Our message starts with an 8 byte UDP header, containing the source & destination ports, the datagram length and a “checksum” used for detecting corruption.

https://en.wikipedia.org/wiki/User_Datagram_Protocol

Plucking out the fields in this header is a case of combining each 2 byte field, or 16-bit word into a single integer. Something we can do with a bit-shift and a binary OR.

With the ground work done, reading the body of our datagram is as simple as dropping 8 headers bytes and packing the rest into an ASCII string, this one-liner should do the trick:

Et voilà! We can tie it all together to recreate our upcase server from earlier.

Discerning readers will notice I have cheated and used UDPSocket to implement the response part of our server. This article is already getting pretty long so I will leave it as an exercise for the reader to implement this part by hand 😄

Further Reading

I hope you’ve enjoyed our whirlwind tour of low-level network programming in Ruby. If this has piqued your interest I’d recommend checking out the following topics and books: