Network troubleshooting with tcpdump and strace

Here is a little example using tcpdump and strace to troubleshoot a network issue with an Oracle connection. It may not be the best approach for all cases, but just an example, as this is a copy/paste of my screen after I analyzed it. I just changed the server names.

At some point, the communication between two servers was hanging, with both endpoints waiting on read() — like ‘SQL*Net message from client’ wait event on the server. This issue occurred only on some circumstances: a case that always reproduced was RMAN correctly connected to the catalog, but hanging when we did a ‘show all’.

Client netstat and strace

I have RMAN started and connected to the catalog.

I identify my client process, in this case the RMAN executable. I display its TCP connection with netstat and its current system calls with strace:

[root@client tmp]# netstat -p | grep "/rman"
Proto Local Address Foreign Address State
tcp client:37051 server:1521 ESTABLISHED 192131/rman
[root@client tmp]# strace -yttp 192131
Process 192131 attached
10:41:29.168494 read(0</dev/pts/2>,

I can see the connection from client to server on port 1521 and the process is waiting on user input (stdin).

Server netstat and strace

On the remote side, I see the same TCP connection (same hosts and ports):

[root@server log]# netstat -p | grep ":37051"
Proto Local Address Foreign Address State
tcp server:1521 client:37051 ESTABLISHED 36761/oraclePDBR2

and the process is waiting on socket (‘SQL*Net message from client’):

[root@server log]# strace -yttp 36761
Process 36761 attached
10:41:43.745239 read(20<socket:[3785769780]>,

Client/server dialogue

Here is where I run ‘show all;’ which will get hung. I’ll show all the network dialogue from this point.

I run tcpdump in background to show the packets sent on client :

[root@client tmp]# tcpdump -i bond0 src client and dst server &

and on server:

[root@server tmp]# tcpdump -i bond0 src server and dst client &

In both I have strace attached to the processes identified by netstat -p (strace -y to show file name for descriptors and -tt to show timestamp)

Here are the traces just after I run ‘show all;’ on the client, which queries the server.

The client sends a statement. Here is the write (from strace) and TCP message (from tcpdump):

10:43:44.309812 write(15<socket:[2594150602]>, "\3+\0\0\6\0\0\0\0\0\21i{\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\5\0\0"..., 811) = 811
10:43:44.309853 IP client.37051 > server.1521: Flags [P.], seq 34484:35295, ack 17685, win 442, length 811

The server receives it and strace displays the end of the line with the read() return code (which is the size of the message):

"\3+\0\0\6\0\0\0\0\0\21i{\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\5\0\0"..., 8208) = 811

In the meanwhile, the client waits for the answer:

10:43:44.309892 read(15<socket:[2594150602]>,

Then the server answers:

10:43:44.356795 write(20<socket:[3785769780]>, "\0\355\0\0\6\0\0\0\0\0\v\1\5q\6\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 237
10:43:44.356830 IP server.1521> client.37051: Flags [P.], seq 17685:17922, ack 35296, win 442, length 237
) = 237

We see strace printing the call (write 237 bytes), then tcpdump showing the packet going out (same size: bytes 17685 to 17922 in this server->client stream), and then strace continues the line when the write() system call returns, displaying the number of bytes written.

Sending this packet is also the occasion to acknowledge what was received — up to byte 35296 of the client->server message.

The client gets it, strace showing the end of the read() line:

10:43:44.357178 IP client.37051 > server.1521: Flags [.], ack 17922, win 442, length 0
"\0\355\0\0\6\0\0\0\0\0\v\1\5q\6\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 237

Note that an empty message (length=0) was sent by the client to acknowledge that it has received up to byte 17922 in the server->client message.

After having sent its answer, the server waits for the next query (and acknowledges what was received recently with a length 0 packet):

10:43:44.356894 read(20<socket:[3785769780]>, 
10:43:44.357493 IP server.1521> client.37051: Flags [.], ack 36267, win 442, length 0

The client sends a new query:

10:43:44.357368 write(15<socket:[2594150602]>, "\3\313\0\0\6\0\0\0\0\0\21i}\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\4\0\0"..., 971
10:43:44.357416 IP client.37051 > server.1521: Flags [P.], seq 35295:36266, ack 17922, win 442, length 971
) = 971

and waits for the response:

10:43:44.357461 read(15<socket:[2594150602]>, "\0\356\0\0\6\0\0\0\0\0\v\1\5\n\7\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 238

From here, I start to show the full line with the return code (238 bytes) but remember that only the ‘10:43:44.357461 read(15<socket:[2594150602]>,’ part is displayed by strace before the message is actually received. The buffer content and size is written later.

Here is the server receiving the message from the client and sending the answer:

"\3\313\0\0\6\0\0\0\0\0\21i}\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\4\0\0"..., 8208) = 971
10:43:44.365296 write(20<socket:[3785769780]>, "\0\356\0\0\6\0\0\0\0\0\v\1\5\n\7\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 238
10:43:44.365326 IP server.1521> client.37051: Flags [P.], seq 17922:18160, ack 36267, win 442, length 238
) = 238

We have seen that the client got it (the 238 bytes). It sends a new query (365 bytes), waits for the answer which is 237 bytes:

10:43:44.365766 write(15<socket:[2594150602]>, "\1m\0\0\6\0\0\0\0\0\21i\177\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\2\0\0"..., 365) = 365
10:43:44.365817 IP client.37051 > server.1521: Flags [P.], seq 36266:36631, ack 18160, win 442, length 365
10:43:44.365870 read(15<socket:[2594150602]>, "\0\355\0\0\6\0\0\0\0\0\v\1\5\221\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 237

Here is the server part for this dialogue, receiving the 365 bytes query and writing the 237 bytes answer:

10:43:44.365408 read(20<socket:[3785769780]>, "\1m\0\0\6\0\0\0\0\0\21i\177\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\2\0\0"..., 8208) = 365
10:43:44.366793 write(20<socket:[3785769780]>, "\0\355\0\0\6\0\0\0\0\0\v\1\5\221\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 237
10:43:44.366829 IP server.1521> client.37051: Flags [P.], seq 18160:18397, ack 36632, win 442, length 237
) = 237

There are a few more exchanges where the client sends 381 bytes, the server answers 236 bytes, then the client sends 297 bytes and the server answers in 181 bytes. Here is the client strace and tcpddump for this:

10:43:44.367199 write(15<socket:[2594150602]>, "\1}\0\0\6\0\0\0\0\0\21i\201\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\4\0\0"..., 381) = 381
10:43:44.367229 IP client.37051 > server.1521: Flags [P.], seq 36631:37012, ack 18397, win 442, length 381
10:43:44.367272 read(15<socket:[2594150602]>, 
"\0\354\0\0\6\0\0\0\0\0\v\1\5L\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 236
10:43:44.368522 write(15<socket:[2594150602]>, "\1)\0\0\6\0\0\0\0\0\21i\203\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\5\0\0"..., 297) = 297
10:43:44.368555 IP client.37051 > server.1521: Flags [P.], seq 37012:37309, ack 18633, win 442, length 297
10:43:44.368598 read(15<socket:[2594150602]>, "\0\265\0\0\6\0\0\0\0\0\10\6\0\252I(\35\2\7\0\0\2\0\0\0\1\0\0\0\0\0\0"..., 8208) = 181

and the strace and tcpdump on the server side:

10:43:44.366884 read(20<socket:[3785769780]>, "\1}\0\0\6\0\0\0\0\0\21i\201\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\4\0\0"..., 8208) = 381
10:43:44.368031 write(20<socket:[3785769780]>, "\0\354\0\0\6\0\0\0\0\0\v\1\5L\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 236
10:43:44.368084 IP server.1521> client.37051: Flags [P.], seq 18397:18633, ack 37013, win 442, length 236
) = 236
10:43:44.368181 read(20<socket:[3785769780]>, "\1)\0\0\6\0\0\0\0\0\21i\203\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\5\0\0"..., 8208) = 297
10:43:44.370926 write(20<socket:[3785769780]>, "\0\265\0\0\6\0\0\0\0\0\10\6\0\252I(\35\2\7\0\0\2\0\0\0\1\0\0\0\0\0\0"..., 181) = 181
10:43:44.370949 IP server.1521 > client.37051: Flags [P.], seq 18633:18814, ack 37310, win 442, length 181
10:43:44.371006 read(20<socket:[3785769780]>, 

The client which got the 181 bytes answer from the server, now sends a new query:

10:43:44.413973 write(15<socket:[2594150602]>, "\10V\0\0\6\0\0\0\0\0\21i\205\376\377\377\377\377\377\377\377\1\0\0\0\0\0\0\0\2\0\0"..., 2134
10:43:44.414020 IP client.37051 > server.1521: Flags [P.], seq 37309:39443, ack 18814, win 442, length 2134
) = 2134

and then waits for the answer:

10:43:44.414115 read(15<socket:[2594150602]>,

but this is where it hangs. Remember where we are:

  • the server has acknowledged up byte 37310 in messages from client and is waiting to read from the socket
  • the client is acknowledging up to byte 18814 in messages from server and sends more bytes (from 37309 to 39443)

But the server didn’t receive them and is still waiting:

10:43:44.371006 read(20<socket:[3785769780]>,

Both side waiting… this is where tcpdump will be helpful. strace shows only one write() call, which writes to the TCP buffer. But tcpdump shows many attempts to send this to the server:

10:43:44.617869 IP client.37051 > server.1521: Flags [P.], seq 37309:39443, ack 18814, win 442, length 2134
10:43:45.025804 IP client.37051 > server.1521: Flags [P.], seq 37309:39443, ack 18814, win 442, length 2134
10:43:45.841818 IP client.37051 > server.1521: Flags [P.], seq 37309:39443, ack 18814, win 442, length 2134
10:43:47.473818 IP client.37051 > server.1521: Flags [P.], seq 37309:39443, ack 18814, win 442, length 2134
10:43:50.737837 IP client.37051 > server.1521: Flags [P.], seq 37309:39443, ack 18814, win 442, length 2134

Same packet, same size, the 37309 to 39443 bytes of the client->server conversation. The last acknowledge from the server was 37310 — he never received this packet. And because it is TCP, it retries. The write() was done because it writes to the buffer, so the application is just waiting for the answer.

A packet has disappeared in the client->server conversation. This is a job for the network team, but there’s something to remark here. Many messages were ok, but not this one. What is special? The size is larger: 2134 bytes. The maximum we had before was 971.

Simple ping…

Those numbers may already ring a bell, but let’s try to PING with some different packets sizes. Up to 1472 goes through:

[root@client tmp]# ping -c 5 -s 1472 -W 5 server
PING client (10.30.13.62) 1472(1500) bytes of data.
1480 bytes from server(10.30.13.62): icmp_seq=1 ttl=58 time=0.607 ms
1480 bytes from server(10.30.13.62): icmp_seq=2 ttl=58 time=0.477 ms
1480 bytes from server(10.30.13.62): icmp_seq=3 ttl=58 time=0.494 ms
1480 bytes from server(10.30.13.62): icmp_seq=4 ttl=58 time=0.474 ms
1480 bytes from server(10.30.13.62): icmp_seq=5 ttl=58 time=0.533 ms

But larger fails:

[root@client tmp]# ping -c 5 -s 1473 -W 5 server
PING server (10.30.13.62) 1473(1501) bytes of data.
--- server ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 9000ms

All packets were lost.

The default Ethernet MTU (maximum transmission size) is 1500 bytes but we use Jumbo Frames here, as defined by the interface:

[root@client tmp]# ip a | grep mtu
...
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP

There is probably a router in the middle which does not allow Jumbo Frames… This is now a job for the network guys. However, if I need a quick workaround I can limit the Session Data Unit on Oracle side, like:

(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=myservice))(SDU=1430)(ADDRESS=(PROTOCOL=TCP)(HOST=server)(PORT=1521)))

With SDU at 1430 the length of the packets (as displayed by tcpdump) are at maximum 1460 bytes (NS and NT layers add 30 bytes) which is the maximum that can go through MTU 1500 (as IP and TCP headers add 20 bytes each).

If you wonder why ping was able to send packets up to 1472 bytes, that’s because the ICMP header is only 8 bytes instead of the TCP 20 bytes.