Packet Analysis

Published in

Walmart Global Tech Blog

14 min readAug 25, 2022

Reading Hex, using BPF and more

The ability to read and sift through packet captures is an essential skill that helps to troubleshoot and isolate issues quickly and efficiently. This writeup takes a quick look at Ethernet, TCP, and IP headers, introduces BPF and finishes off by trying to solve a problem statement by pairing what we have learnt, with a few CLI tools that ship with Unix/Linux.

Before we begin discussing headers, here are some terms we need to know:

• 1 hex character = 4 bits

• Nibble = 4 bits

• Byte = 8 bits

• Word = 16 bits

• Double word = 32 bits = 4 bytes

Headers

Header fields are referenced using their offset number. Offset numbering always starts from 0 and each number represents 1 byte. For example, the Ethernet header is 14 bytes long, has an offset numbering from 0 through 13, and the 6-byte destination MAC address occupies offset 0 through 5.

Ethernet header

• The Ethernet header is 14-bytes long, with the destination MAC, source MAC and ether type fields.

• The Ether type field identifies the encapsulated protocol that follows Ethernet. Some of the common ether type values are ‘0x0800’ for IPv4, ‘0x8100’ for VLAN and ‘0x0806’ for ARP.

Following the Ethernet header is the payload (max 1500 bytes) and a 4-byte CRC that’s used for detecting frame corruption. This makes the total Ethernet frame length of 1518 bytes.

IPv4 header

IP is responsible for moving packets across the Internet. It is an unreliable protocol that makes no effort to guarantee delivery of the packet. Reliable delivery is the job of the transport layer (or the application itself) to detect and retransmit lost packets. IP header fields include:

• IP version: 4-bit field that identifies the IP version.

• Header length: Indicates the length of the IP header, in double words. For instance, if the number here is 5, it means 5 double words, which is turn is 5x4-bytes, for a total header length of 20-bytes. In fact, IP headers are at a minimum of 20-bytes long and can go up to 60-bytes if options are used. Why 60-bytes? Because, the header length is 4-bits long, the maximum value in binary is 1111, which is 15 in decimal. So, 15*4 = 60 bytes.

• DSCP and ECN: 6 bits of differentiated services code point (DSCP) allows for classifying traffic types and providing certain types of traffic, such as voice, priority on the Network. 2 bits of explicit congestion notification (ECN) allows routers detect and manage congestion situations on the Network. The ECN bits works with the congestion window reduced (CWR) and ECN flag in the TCP header.

• Total length: This field represents the total length of the IP packet, i.e., IP header, transport header and payload.

• IP ID: This is a random number assigned to each new packet. The IP ID field typically comes into focus only when we are dealing with fragmented IP packets.

• Fragmentation bits: The 3 higher order bits represent reserved (X), do not fragment (D) and more fragments to follow (M). The 13 lower order bits represents the position of a fragment within a fragmented IP packet.

• Time To Live: TTL ensures that packets do not exist aimlessly in the Network. For every hop a packet traverses, the TTL is reduced by 1. When a packet’s TTL is 0, it is discarded. One of the most widely used troubleshooting tools, traceroute involves the use of TTL to determine the routers/hop along a network path.

• Protocol: This field identifies the embedded protocol, of which the mostly commonly found values are 0x06 (TCP), 0x01 (ICMP) and 0x11 (UDP).

• Checksum: Helps identify packets corrupted in transit. The receiving host should silently discard a packet with an invalid checksum.

• Source and Destination address: These are 32-bit fields found in offsets 12–15 and 16–19, respectively.

• Options: IP options was originally intended to help with troubleshooting and if used, extends the IP header size to greater than 20-bytes. Some IP options include strict source routing (SSR) and loose source routing (LSR).

TCP header

TCP is responsible for providing reliable and guaranteed packet delivery, something which IP is not capable of. Let’s take a quick look at the TCP header.

• Source and destination port: 16-bit fields occupying 0–3 offsets of the TCP header. Source port is typically greater than 1024 and the destination port refers to the service/application that the server is awaiting connections on, such as 443 for SSL or 80 for HTTP.

• Sequence number: Sequence number allows multiple segments within a single TCP stream to be ordered correctly at the destination. Its value increases by the number of bytes in a payload and hence another way of thinking about it, is that it allows to calculate the amount of data has flowed in a connection.

• Acknowledgement number: A destination acknowledges received data, by sending a response to the sender with the ACK flag set and with a number whose value is the amount of data in received segment + sequence number. That is:

• Acknowledgement number = (Amount of data received) + Sequence number.

• Sequence and acknowledgement numbers work together to make TCP a reliable protocol, one that knows how much data has arrived at the destination and what data is lost and needs to be retransmitted.

• Header length: This represents the TCP header length in double words. Since a double word is 4-bytes, multiple the number found in this field by 4 to get the TCP header in bytes.

• Flags: Occupying offset13, TCP can turn flag bits on or off to represent various signals. A quick way to remember the flags is the mnemonic “U(rg)nskilled A(ck)ttackers P(ush)ester R(st)eal S(yn)ecurity F(in)olk”. Here’s a quick explanation of these flags in the mnemonical order.

• Urgent: URG flag signals that the urgent pointer field is valid.

• Acknowledgement: ACK indicates that value found in the Acknowledgement field is valid. During a normal traffic flow, after the initial SYN, the ACK flag is always set.

• Push: PSH flag indicates to the receiving host to flush data in its TCP buffer to the receiving process/application right away.

• Reset: Rst flag is used to abruptly end a connection, for instance when a host receives a connection request on a non-listening port, the receiving host does not what to do with it and sends a reset flag back to the requesting host.

• Synchronize: The SYN flag is used when a host want to establish a new connection. Both the initiating and destination host use their own separate SYN to establish connections to each other. The SYN flag counts as a byte.

• Finish: FIN flag is used when a connection is gracefully terminated. Like SYN, both sides of the connection send individual FINs and like SYN, FIN also counts as a byte.

• Explicit Congestion Notification Echo (ECE): This bit is turned on to inform the sending host to reduce the rate at which data is sent.

• Congestion Window Reduced (CWR): When a sending host receives a segment with ECE bit set from a recipient, the sending host reduces its sending buffer by half and sets the CWR flag to indicate corrective action has been taken.

• The ECN bits of IP header and ECE and CWR bits of TCP header are used in conjunction to indicate the presence (during the 3-way TCP handshake) of and later support congestion notification.

• Window size: Dynamic flow control mechanism that the receiving host uses to inform its buffer size to the sending host. The window size becomes smaller when the receiving host’s TPC buffer starts to become full.

• Checksum: TCP uses a pseudo header consisting of source and destination IP address, as an additional check to ensure that a host hasn’t received a packet meant for a different host. TCP checksum is validated only by the communication endpoints/hosts.

• Urgent pointer: Used in conjunction with the URG flag to indicate that data between the current sequence number and the value in this field should be processed immediately before any other other data in the host’s buffer. i.e. Urgent pointer indicates the byte offset where the urgent data ends.

• Options: Unlike IP option, TCP options are widely used. Some of the common options used are Maximum Segment Size (MSS) to indicate the maximum payload size that a host can send, Selective Acknowledgement (SACK) which allows for acknowledgment of non-contiguous bytes of data and window scale that allows a host to increase its receive buffer.

Reading Hex

Here’s an example of what a (redacted and highlighted) packet in hex looks like. Let us identify a few parameters using the header/offset information presented and attempt some hex to decimal conversion.

The portion highlighted in yellow represents the Ethernet header.

The 1st 6-bytes of the packet represent the destination MAC address, followed by source MAC address.
The last 2-bytes (i.e offset 12, 13) of Ethernet frame ‘0x0800’, indicate that the encapsulated protocol is IP

This is followed by the IP header highlighted in blue.

Offset 0 represents IP version and header length. In this packet, the IP version is v4 and the header length is 20 bytes (5*4 double-words). Therefore, no IP-options are used in this packet.
The value ‘0x06’ of the IP-protocol field at offset 9 indicates TCP.

Following the 20-bytes of IP-header, the TCP header is highlighted in green.

Offset 2 and 3 contain the destination port number in hex ‘0x01bb’. Hex is a base-16 numbering system. To convert this to decimal, we raise 16 to the position of each hex character and multiple with the respective hex character. This is then summed up to get the result in decimal.
This looks as follows:

(16³ x 0) + (16² x 1 ) + (16¹ x b) + (16⁰ x b)
= 0 + 256 + (16x11) + (1x11)
= 443

Let’s use a screenshot of the same packet from Wireshark to corroborate our packet walkthrough:

Berkely Packet Filter (BPF)

BPF is an architecture and mechanism that was built to allow filtering of network packets on their way to an application and discard unwanted packets as early as possible [1]. Tcpdump is an example of a tool that uses BPF to implement its filtering capabilities [2]. BPF has now introduced additional capabilities to extend kernel functionality, allowing new applications to be created across networking, security, profiling, and monitoring [3]. On a related note, for a more detailed look/an exciting application of Extended BPF (eBPF), take a look at Walmart’s L3AF project here.

With this quick introduction to BPF, let’s see how we put together expressions to assist with better analysing packet captures.

BPF format

The format that we are likely already familiar with, uses expressions to filter traffic based on keywords such as protocol, port, network, and a combination of these. For example, ‘icmp’, ‘udp port 53’, ‘net x.x.x.0/24’.

Syntax: tcpdump <keyword> <value>
tcpdump -r demo.pcap -ntc1 ‘dst port 443’
IP 192.168.1.1.52255 > 40.103.10.38.443: Flags [.], seq 2667572664:2667574014, ack 3107627624, win 4096, length 1350

What if we want to dig deeper and find for instance, only TCP packets with SYN flag set? From the protocol header, we can use the offset value to looks for the desired field. Consider the following example — here we are asking tcpdump to look for the value in 13th offset of TCP (i.e flags field), where the value is ‘2’.

Syntax: tcpdump <protocol> <offset:length (length is optional)> <relation> <value>
tcpdump -r demo.pcap -ntc1 ‘tcp[13]=0x02’
IP 192.168.1.1.52262 > 13.107.139.11.443: Flags [S], seq 2517098632, win 65535, options [mss 1350,nop,wscale 6,nop,nop,TS val 35399207 ecr 0,sackOK,eol], length 0

How do we know the flag value needs to be ‘2’?

Let’s take a closer look at the TCP flags field — This is a 1-byte field, that gets represented in hex as 2 characters. Each hex character is 4-bits long (i.e 1-nibble). The value of each field is represented in binary as 2, raised to the power of the flag’s position. 2¹ (i.e. 2), therefore indicates to our filter to look for packets that the ‘Syn’ flag set.

Bit Masking

Examining a field which is exactly 1-byte long is straightforward. What if you want to look for a value that is 4-bits (for example, the IP version) or look for a specific field regardless of other values — example, look for ‘Syn’ and ‘Ack’ packets regardless of whether ‘ECN’ flags are set or not?

We use something called bit masking, which uses logical AND operation to cancel out unwanted bits. Let’s see with an example what this looks like.

Consider that you want to search for IP packets with options used i.e. The IP header size is greater than the standard 20-bytes. The IP header value is located at lower order nibbles of offset 0 of the IP header:

Since the ‘Header Length’ represents the number of double words, a value of 5 in this field indicates an IP header of 20-bytes (i.e., 5 * 4 double-words). Since IP version and header length together occupy one offset field (i.e., 1-byte), we must include both fields in our expression. For instance, ip[0] > 0x45, would filter IPv4 packets whose packet size is greater than 20-bytes. How about ip[0] > 0x75? This would not work since the value of 7 (incorrect IP version number) would exclude all IPv4 packets.

The solution is therefore to somehow eliminate the IP version value. Enter bit masking.

Consider what the IP version and header length fields would like in a typical IPv4 packet.

Next, consider our desired value of version and header length.

Next, we must select a suitable bit mask, that when logically ANDed with the value in offset 0, gives us our desired output. A logical AND returns true (i.e., 1) when both values are true, else returns false (i.e., 0). Said another way, use 0 in the mask bit to eliminate a value and 1 to preserve a value.

How about a mask value of 0000 1111–0x0f in hex?

Let’s AND this mask value with the value found in a typical IPv4 packet’s offset 0. We find that we get the desired value of 0000 0110 — which is 0x05 in hex.

Finally, our BPF expression to filter only packets with IP-options used (i.e., with header size greater than 20-bytes), is ‘ip[0] & 0x0f > 0x05’.

Similarly, a mask to filter packets with ‘Syn’ and ‘Ack’ fields, regardless of other flags and without ‘ECN’ would be ‘0011 1111 (0x3f in hex)’, and the corresponding BPF expression would be ‘tcp[13] & 0x3f = 0x12’.

Putting it all together

Let’s say you are in possession of a large packet capture, and you want to find all the IPs and listening ports that are accepting connections. How would you identify if a server is accepting connections? Looking for IPs that have responded with a ‘Syn-Ack’ is a good starting point.

First, we can use ‘Editcap’ to narrow down the analysis to a specific time frame. For example:

editcap -A “2021–12–27 15:05:11” -B “2021–12–27 15:05:12” demo.pcap demo-extract.pcap

Capinfos is a tool that help display metadata about a capture file. Using this tool to validate the file sizes, we observe that the original file is 12MB and the extracted file is 973kB in size.

capinfos demo.pcap | grep size
File size: 12MB
capinfos demo-extract.pcap | grep size
File size: 973kB

Let’s now use tcpdump and the BPF expression we built earlier, to filter packets with both ‘Syn’ and ‘Ack’ set. (Note, ‘Ack’ in tcpdump is represented as a period.)

tcpdump -r demo.pcap -nt ‘tcp[13] & 0x3f = 0x12’

We get our output, though in a verbose and clunky format. Not to mention there are close to 200+ lines. Let’s see if we can trim this output.

In Unix/Linux based systems, we can pipe the output from one tool as the input of another tool — you take the output from the tool on the left of the ‘|’ and pass it as input to the tool on the right. Here, we pipe the output from tcpdump as input to Cut.

Using the Cut tool, we can treat the output as columns and in this case, we treat ‘>’ as a delimiter and extracting only the 1st column.

tcpdump -r demo.pcap -nt ‘tcp[13] & 0x3f = 0x12’ |cut -d ‘>’ -f 1

Let’s also remove the letters ‘IP’ from the output. Now we use a single whitespace ‘ ‘ as the delimiter and match the 2nd column.

tcpdump -r demo.pcap -nt ‘tcp[13] & 0x3f = 0x12’ | cut -d ‘>’ -f 1 | cut -d ‘ ‘ -f2

That’s a much cleaner output, but we still have 200+ lines to contend with. We pass the output through sort and uniq to get an output that’s only 11 lines long.

tcpdump -r demo.pcap -nt ‘tcp[13] & 0x3f = 0x12’ | cut -d ‘>’ -f 1 | cut -d ‘ ‘ -f 2 | sort -V | uniq

The final output is manageable list of unique IPs and the ports on which they are actively listening and accepting connections on. Do you have these many web servers in your environment? Is the service running on ephemeral port 59456 intentional and known to your organisation?

Conclusion

We have taken a quick look at some of the important protocol headers, introduced BPF and demonstrated how we can use inbuilt CLI tools to reduce verbose outputs into crisp, manageable ones. This article just scratches the surface of TCP/IP headers and packet analysis.

For folks that are already aware of the topics discussed, hope this was a good refresher. For others, hope you learnt something new that you can use in your analysis/troubleshooting tasks at work. Please let me know your thoughts on the content.