Things you don’t expect from netlink
Netlink is a special socket protocol for Linux for network-level configuration. Normally it’s used by iproute2 (/usr/bin/ip
) to configure network interfaces, and it’s expected to be low-volume.
Today I found that it’s not always the case.
recvmsg(7, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000004}, msg_namelen=128->12, msg_iov=[{iov_base={{len=68, type=RTM_NEWNEIGH, flags=0, seq=0, pid=0}, {ndm_family=AF_BRIDGE, ndm_ifindex=if_nametoindex("vx_i_102"), ndm_state=NUD_REACHABLE
This is a snippet from strace for systemd-networkd process, eating 100% of CPU. I’d created something like a loop between two vxlans and there were tons (well, … less the 100Mbps) of arp/ipv6 ndisc traffic. Whilst the flood in that loop was a bit unfortunate, I was mostly surprised by presence of the flood in the netlink during that moment.
Moreover, it affected systemd-networkd, which, too, is expected to be low volume service, but, suddenly, got itself in the top of the top
due to excessive number of AF_NETLINK
messages.
Whilst source of the loop is a pending investigation, systemd-networkd issue is of a separate concern to me. It can be mitigated by disabling everything odd on the L2 interfaces (like that ‘vx_i_102
’, which was expected to be a dump pipe, not a host interface with bunch of protocols). Nevertheless, I feel like my abstractions of separation of control plane and data plane in Linux is a bit leaky. And I don’t like that.
It’s really odd. I think, I’ll report this to systemd repo. May be there is a way to disable acceptance of RTM_NEWNEIGH, RTM_DELNEIGH, RTM_GETNEIGH (per man netlink
) messages?
My second source of surprise is existence of those messages, as I thought that netlink is interfaces-specific protocol, not ‘arp-incuding’. Someone has added it to netlink…
Upd: I’ve send bug report. I doubt it would gain traction (given the 2 year old distro), but, nevertheless, I should at least to report it:
UPD #2 I checked Linux history, and RTM_NEWNEIGH
was there at refactoring in 2012, so it’s not a new addition. May be it was there from the very beginning? In this I’d say it’s a big WTF for any netlink application.