Things you don’t expect from netlink

George Shuklin
OpsOps
Published in
2 min readOct 18, 2021

Netlink is a special socket protocol for Linux for network-level configuration. Normally it’s used by iproute2 (/usr/bin/ip) to configure network interfaces, and it’s expected to be low-volume.

Today I found that it’s not always the case.

recvmsg(7, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000004}, msg_namelen=128->12, msg_iov=[{iov_base={{len=68, type=RTM_NEWNEIGH, flags=0, seq=0, pid=0}, {ndm_family=AF_BRIDGE, ndm_ifindex=if_nametoindex("vx_i_102"), ndm_state=NUD_REACHABLE

This is a snippet from strace for systemd-networkd process, eating 100% of CPU. I’d created something like a loop between two vxlans and there were tons (well, … less the 100Mbps) of arp/ipv6 ndisc traffic. Whilst the flood in that loop was a bit unfortunate, I was mostly surprised by presence of the flood in the netlink during that moment.

Moreover, it affected systemd-networkd, which, too, is expected to be low volume service, but, suddenly, got itself in the top of the top due to excessive number of AF_NETLINK messages.

Whilst source of the loop is a pending investigation, systemd-networkd issue is of a separate concern to me. It can be mitigated by disabling everything odd on the L2 interfaces (like that ‘vx_i_102’, which was expected to be a dump pipe, not a host interface with bunch of protocols). Nevertheless, I feel like my abstractions of separation of control plane and data plane in Linux is a bit leaky. And I don’t like that.

It’s really odd. I think, I’ll report this to systemd repo. May be there is a way to disable acceptance of RTM_NEWNEIGH, RTM_DELNEIGH, RTM_GETNEIGH (per man netlink) messages?

My second source of surprise is existence of those messages, as I thought that netlink is interfaces-specific protocol, not ‘arp-incuding’. Someone has added it to netlink…

Upd: I’ve send bug report. I doubt it would gain traction (given the 2 year old distro), but, nevertheless, I should at least to report it:

UPD #2 I checked Linux history, and RTM_NEWNEIGH was there at refactoring in 2012, so it’s not a new addition. May be it was there from the very beginning? In this I’d say it’s a big WTF for any netlink application.

--

--

George Shuklin
OpsOps

I work at Servers.com, most of my stories are about Ansible, Ceph, Python, Openstack and Linux. My hobby is Rust.