Case study: Network bottlenecks on a Linux server: Part 1 — The NIC

5 min readApr 19, 2018

Not long ago I had to track down why websocket connection attempts from our clients to one of our services always started being refused soon after our peak time window began. I’m not a system administrator not a networking expert, but I’ve been using Linux systems and administrating smaller Linux servers for a long time, so at least I knew what to investigate and which topics I had to read up on. The following is a case study of what was investigated, and which actions where taken and why.

The series is divided into (likely) these four parts:

Part 1: The NIC (this article)
Part 2: The Kernel
Part 3: Interrupts
Part 4: Going further

Observation

During peak time, we could see that the amount of open websockets reached a sort of “cliff,” where no new connections where being accepted, and the amount of connections rapidly dropped:

We noticed that we where getting a buildup of connections in the SYN-RECV state right before this cliff happened. The below image illustrates it, though the timestamps of the below graph is not from the same period as the graph above:

Case study: Network bottlenecks on a Linux server: Part 1 — The NIC

Observation

Written by Oscar Eriksson