Unpacking a Raw Packet | Ethernet Frame | Part -1

Examining a Network Packet in its RAW Form

Harsha Koushik
Kernel Space
6 min readMay 13, 2021

--

Prerequisites —
Good understanding of how a Network Packet flows in terms of OSI Reference Model.

Be it a YouTube video, a WhatsApp message or a Snap you get everyday, in its simplest form is a Network Packet. And in its RAW form, its all BINARY (0s and 1s). A random snap you get from your friend in its RAW form will look something like this —

01110111011101101000001111011011110111111110101011011111111111111111000101010000000000101010101000000000000011111110100000000000000000000011111111100001010...

The way in which these 1s and 0s are represented as a picture/video/audio is the real beauty of computers. It just seems like magic to people who don’t understand the logic behind this magic.

Now let us understand how exactly these Raw bits are converted into meaningful data pieces with respect to OSI Reference model. Referencing OSI model is a really clean way to explain this stuff. OSI stands for Open System Interconnect and is a Reference model which makes us understand and troubleshoot Network Data Flow.

OSI Reference Model

As shown in this figure, the Physical layer is the first entry point of Data. Physical Layer represents the Medium of Transport for data like the Ethernet/Fiber Optic Cables, electric signals/light rays etc.. As we cannot directly represent those signals or rays, we need not spend much time on the Physical Layer. Actual Representation of Data can be done in Data Link Layer, for us the exploration begins from Layer-2 of OSI — Data Link Layer.

Data which comes to Data Link Layer is considered as a Frame. Frame is again group of bits, but the Sequence of the Bits is the most important thing for the kernel. To be specific, the Networking Subsystem in the Kernel takes care of all this. But how exactly Networking Subsystem and its APIs work, NIC and Device Drivers at this level work is not in the scope of this article, so we’ll keep it for an other article. I will use the word Kernel to keep things simple, instead of pin pointing which specific API in the Networking Subsystem of Kernel does which job when a packet comes.

Back to the Point, when we get some Random Binary Data, by seeing the initial bits of that Data, Kernel recognizes whether it is a Frame or not. Let us look at the structure of a Frame —

Ethernet Header

Preamble — Preamble is a 7 Byte field which just consists series of 1s and 0s like 10101010101010101010…. 56 Bits like this in sequence. This is to make sure the receiver clocks are synchronized, lock on to the Data Stream before the actual Frame begins.

SFD : Start of Frame Delimeter — This is the continuation of Preamble which also repeats 101010.. but the last bit would be a ‘1’ which indicates, the sequence of 1s and 0s is now over, actual Frame begins now after the last bit ‘1’.

Destination MAC — The first field in the actual Frame is Destination Address which is a MAC address as it is the Data Link Layer. It is 6 Bytes long.

Source MAC — Right after the Destination Address, there is Source MAC address which is of same length- 6 Bytes.

Type — Type Field holds information about the upper layer, whether it is an IPv4 packet or an IPv6 one. 0x0800 is for IPv4, 0x86dd is for IPv6, 0x8100 for a Dot(.)1Q Frame and 0x0806 for ARP..

Data — Data is the actual Payload sent, in this case an IPv4 Packet. Data size varies from 46–1500 Bytes.

FCS : Frame Check Sequence — Frame Check Sequence lives in the Trailer of a Frame which contains the CRC(Cyclic Redundancy Check) value used for Error Detection.

Inter Frame Gap — Transmitters will transmit a minimum of 96 bits(12 bytes) of idle line state before transmitting the next packet.

When Kernel sees 7 Bytes of 1s and 0s it understands whats happening and tells, get ready boys, we have an incoming frame, get ready to Parse it and Process it.

Note : This is the Most commonly used Ethernet Type which is Type II. There are other Frame Types as well such as Novell raw IEEE 802.3, IEEE 802.2 LLC and IEEE 802.2 SNAP. The fields will be slightly different than this type.

Now let us unpack a RAW Frame in Linux using Python.

Unpacking Ethernet Frame

Two main things here are Socket and Struct. ‘Socket’ module here in our case is used to create a Raw socket and listen for Network Data and ‘Struct’ module is basically used to handle Binary Data from files/network etc.. in python.

In the main function, we continuously listen to network data and the conn.recvfrom() function spits out a tuple. You can print it and understand what exactly that is. Then we pass this raw_data to an ethernet_unpack function which is already defined by us.

We pass the Raw Data to a function called struct.unpack() which does break our data into meaningful pieces.

! — is used to specify it is Network Data. Network Data is represented in Big-Endian while the Host Data is represented in Little-Endian format, given the Host’s CPU is Intel.

6s — to specify it is 6 continuous bytes. The first 6 bytes as dest_mac

6s — to specify 6 bytes. Next 6 bytes as src_mac

H — to specify a two bytes (unsigned short). 2 bytes as Type

The s, H here are format characters. To know more about these — https://docs.python.org/3/library/struct.html

We passed Raw_Data only till the 14th byte with data[:14]. From data[14:] is our actual payload. We will be dealing with the Payload in the upper layers of OSI.

After unpacking the data, it returns the unpacked data to the caller. While returning we also format the data in meaningful/understandable network data. Here we call one more function called getmac(), which is already defined by us.

As the Dest_Mac and Source_Mac even after unpacking are still in Binary, we need to convert that into a proper Mac address which is in colon separated Hex values. To do that we use this map function, {:02x} actually sets the width of the hex value to 2 and also adds a leading 0 if required. We join these values with colon (:) and return it to the caller.

Now as we are done with Data Formatting, we return it to the caller in Main Function. We can now go ahead and print the Data which got unpacked. The output looks something like this —

The actual Payload contains data related to the Upper Layers such as Network and Transport. We will unpack the Payload in the coming articles..

--

--