Under the hood of TCP Socket Implementation On Golang

Golang is surely my first to go language to write web application, it hides many details but still gives a lot of flexibility. Recently I did strace to an http service app, actually did it without a reason but I found something interesting.

Here’s what strace -c gave me :

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
91.24 0.397615 336 1185 29 futex
4.13 0.018009 3 7115 clock_gettime
2.92 0.012735 19 654 epoll_wait
1.31 0.005701 6 911 write
0.20 0.000878 3 335 epoll_ctl
0.12 0.000525 1 915 457 read
0.02 0.000106 2 59 select
0.01 0.000059 0 170 close
0.01 0.000053 0 791 setsockopt
0.01 0.000035 0 158 getpeername
0.01 0.000034 0 170 socket
0.01 0.000029 0 160 getsockname
0.01 0.000026 0 159 getsockopt
0.00 0.000000 0 7 sched_yield
0.00 0.000000 0 166 166 connect
0.00 0.000000 0 3 1 accept4
------ ----------- ----------- --------- --------- ----------------
100.00 0.435805 12958 653 total

There are a lot of interesting things in that profiling result, but the highlight of this article are the error number of read and the number of futex call

At the first sight I didn’t think about futex call, which most of them are a wake call. I thought that this app serve hundreds req per second, it should has many go routine; At the other hand it also leverage channel, it should has many under the hood blocking mechanism. So a lot of futex call should be normal, later I found out that this number was also from other things -will back to it later-.

Why you no read

Who’s like error? Worst when it happens hundreds time in less than a minute, that’s what I thought when looking to that profiling result. What the heck with that read call?

read(36, "GET /xxx/v3?q=xx%20ch&d"..., 4096) = 520
...
read(36, 0xc422aa4291, 1) = -1 EAGAIN (Resource temporarily unavailable)

After every read call there is (maybe) always one read to the same file descriptor with EAGAIN error. Hey I remember this error, it comes when the file descriptor is not ready to certain operation; in this case read . But Why golang do this way?

I thought there was bug from the epoll_wait which gave wrong ready event to every fd? every fd? It looks like double ready rather than wrong event, why double?

Honestly my experience with epoll is shallow, basic event loop socket handler [0] is the only one. No multi threading, no synchronization, it’s very plain. After googling I found Marek (clever guy) criticizing epoll [1], and it’s like boom!!!

Important TL;DR from that post : unnecessary wake up is ‘generally’ unavoidable when using epoll with multi threaded worker, because weneed to notify each worker about waiting ready events.

Hey that also explain about our futex wake numbers. But lets take a look at simplified version on how to use epoll in event loop socket implementation to further understand it :

  1. Bind socket listener to file descriptor, lets call it s_fd
  2. Create epoll file descriptor using epoll_create, lets call it e_fd
  3. Bind s_fd to e_fd using epol_ctl for particular events (usually bothEPOLLIN|EPOLLOUT )
  4. Create forever loop (event loop) which call epoll_wait in every iteration to get incoming ready connections fd from binded events
  5. Handle the ready connection, in multi worker implementation means notifying each worker

Using strace I found that golang using edge triggered epoll

epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2490298448, u64=140490870550608}}) = 0

Which means this is also valid on golang socket implementation :

  1. Kernel: Receives a new connection.
  2. Kernel: Notifies two waiting threads A and B. Due to “thundering herd” behavior with level-triggered notifications kernel must wake up both.
  3. Thread A: Finishes epoll_wait().
  4. Thread B: Finishes epoll_wait().
  5. Thread A: Performs accept(), this succeeds.
  6. Thread B: Performs accept(), this fails with EAGAIN.

At this point I am 80% confident this was the case, but let’s do some analysis using this simple web app [2]. Here is what it does on single request.

epoll_wait(4, [{EPOLLIN|EPOLLOUT, {u32=2186919600, u64=140542106779312}}], 128, -1) = 1
futex(0x7c1bd8, FUTEX_WAKE, 1) = 1
futex(0x7c1b10, FUTEX_WAKE, 1) = 1
read(5, "GET / HTTP/1.1\r\nHost: localhost:"..., 4096) = 348
futex(0xc420060110, FUTEX_WAKE, 1) = 1
write(5, "HTTP/1.1 200 OK\r\nDate: Sat, 03 J"..., 116) = 116
futex(0xc420060110, FUTEX_WAKE, 1) = 1
read(5, 0xc4200f6000, 4096) = -1 EAGAIN (Resource temporarily unavailable)

Look how after epoll_wait there are two futex wake call which I think is the worker and then there are error read.

How if I set GOMAXPROCS to 1, enforcing single worker behavior.

epoll_wait(4,[{EPOLLIN, {u32=1969377136, u64=140245536493424}}], 128, -1) = 1
futex(0x7c1bd8, FUTEX_WAKE, 1) = 1
accept4(3, {sa_family=AF_INET6, sin6_port=htons(54400), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28], SOCK_CLOEXEC|SOCK_NONBLOCK) = 6
epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1969376752, u64=140245536493040}}) = 0
getsockname(6, {sa_family=AF_INET6, sin6_port=htons(8080), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(6, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(6, SOL_TCP, TCP_KEEPINTVL, [180], 4) = 0
setsockopt(6, SOL_TCP, TCP_KEEPIDLE, [180], 4) = 0
accept4(3, 0xc42004db78, 0xc42004db6c, SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
read(6, "GET /test?kjhkjhkjh HTTP/1.1\r\nHo"..., 4096) = 92
write(6, "HTTP/1.1 200 OK\r\nDate: Sat, 03 J"..., 139) = 139
read(6, "", 4096) = 0

When using 1 worker, there is only once futex wake call after epoll_wait and no read error. However I found this behavior is not consistent, in some trial I still got read error and 2 futex wake call.

And then what to do?

In the same post [1] Marek talk about EPOLLEXCLUSIVE which available starting from linux version 4.5. I’m using version 4.8 why it’s still happening on my maching? Turns out that golang is not using that flag yet, I hope at the next next version it will support that flag when available.

I learned a lot from this journey, hope you too.

[0] https://banu.com/blog/2/how-to-use-epoll-a-complete-example-in-c/ [1] https://idea.popcount.org/2017-02-20-epoll-is-fundamentally-broken-12/
[2] https://gist.github.com/wejick/2cef1f8799361318a62a59f6801eade8