Diving into Golang: How does it effectively wrap the functionality of epoll?

Garry
12 min readAug 13, 2023

--

In the days before coroutines became mainstream, synchronous blocking was the villain of the piece in traditional network programming, synonymous with poor performance. The heroes of the story were the various asynchronous non-blocking models built on epoll, which succeeded in boosting performance. However, they had a fatal flaw: their reliance on callback functions was at odds with the way humans naturally think in a linear way. As a result, the code produced was often harder for people to understand.

Then along came Golang, pushing the coroutine programming model into the spotlight. This innovative approach managed to combine the best of both worlds: the simplicity and ease of use of synchronous programming, and the avoidance of high-performance losses during thread switching thanks to the smart combination of coroutines and epoll. In short, Golang offered a solution that was easy to use and performant- a true game-changer in the field.

Today, we’re going to dive deep into the net package provided by Golang itself, to understand how it manages to deliver on its promises.

Golang net usage

Given that not everyone reading this may have experience with Golang, let’s start with a simple example of a Golang service using the official net package. To keep things clear, I’ll only be showing the core code.

package main

import "net"

func main() {
listener,_ := net.Listen("tcp","127.0.0.1:9008")
for {
conn, err := listener.Accept()
go process(conn)
}
}
func process(conn net.Conn) {
defer conn.Close()
var buf [1024]byte
len, err := conn.Read(buf[:])
_, err = conn.Write([]byte("I am server!"))
if err != nil {
return
}
}

// Run the command `go run main.go` in your terminal to start the server
// Open another terminal window and run the command `telnet 127.0.0.1 9008` to connect to the server
// Type any message and press enter. The server will response with "I am server!"

In this sample service program, we first use net.Listen to listen to a local port, 9008 in this case. Then we call Accept to receive and handle the connection. If a connection request is received, we use go process to launch a coroutine for handling it. During the connection handling, I demonstrate both read and write operations (Read and Write).

At first glance, this service program looks like a traditional synchronous model. Operations like Accept, Read, and Write all seem to “block” the current coroutine. For instance, with the Read function, if the server calls it before the client data arrives, Read will not return, effectively parking the current coroutine. Only once data arrives does Read return, allowing the processing coroutine to resume.

If you were to write similar server code in other languages, such as C or Java, you’d be in a world of hurt. That’s because each synchronous Accept, Read, and Write operation would block your current thread, leading to a lot of wasted CPU cycles due to thread context switching.

Yet, in Golang, this kind of code performs impressively well. Why is that? Well, let’s keep reading to find out.

The Underlying Process of Listen

In more traditional languages like C or Java, the listen function is typically implemented as a direct call to the kernel’s listen system call. If you’re thinking that’s what the Listen function does in Golang’s net package, think again.

Unlike its counterparts in other language, Golang’s net Listen does several things:

  • Creates a socket and sets it to non-blocking
  • Binds and listens to a local port
  • Calls listen to start listening
  • Uses epoll_create to create an epoll object
  • Uses epoll_etl to add the listening sockets to epoll, waiting for a connection

One call to Golang’s Listen is equivalent to multiple calls to functions like socket, bind, listen, epoll_create, epoll_etl, etc. in C. It’s a highly abstracted operation, hiding a lot of the underlying implementation details from the programmer.

But don’t just take my word for it-let’s take a peek under the hood.

The Listen function’s entry point can be found in the net/dial.go file in Golang’s source code. Let’s delve deeper into the details.

2.1 Unraveling the Code: The Execution Path of the Listen Function

Unlike the detailed dissections often associated with source code reviews, our journey here requires an understanding of the broader picture rather than a line-by-line analysis.

// src/net/dial.go
func Listen(network, address string) (Listener, error) {
var lc ListenConfig
return lc.Listen(context.Background(), network, address)
}

This is an entry point, the next will go through the Listen function of ListenConfig. The Listen function in ListenCofig will determine if it is a tcp type, and go through the listenTCP function under sysListener. Then, after two or three function jump, will go to socket function in net/sock_posix.go
The Listen function serves merely as an entry point. From there, execution proceeds to the Listen method within the ListenConfig. If the method identifies the connections as a TCP type, it dives deeper into the listen TCP method located within the sysListener(path:src/net/tcpsock_posix.go). A series of function call jumps later, we find ourselves in the socket function under the net/sock_posix.go file. Let’s delve right into that.

// net/sock_posix.go
func socket(ctx context.Context, net string, family, sotype, proto int, ipv6only bool, laddr, raddr sockaddr, ctrlCtxFn func(context.Context, string, string, syscall.RawConn) error) (fd *netFD, err error) {
s, err := sysSocket(family, sotype, proto)
...
if laddr != nil && raddr nil {
switch sotype {
case syscall.SOCK_STREAM, syscall.SOCK_SEQPACKET:
if err := fd.listenStream(ctx, laddr, listenerBacklog(), ctrlCtxFn); err != nil {
fd.Close()
return nil, err
}
return fd, nil
...
}

2.2 Spinning Up a socket

Go’s sysSocket function deviates significantly from the socket function in many other languages. It handles three crucial tasks all within a single function: creating a socket, binding it, and setting it to listen. Let’s delve deeper into the specifics of sysSocket.

// src/net/sys_cloexec.go
func sysSocket(family, sotype, proto int) (int, error) {
// See ../syscall/exec_unix.go for description of ForkLock.
syscall.ForkLock.RLock()
s, err := socketFunc(family, sotype, proto)
if err nil {
syscall.CloseOnExec(s)
}
syscall.ForkLock.RUnlock()
if err != nil {
return -1, os.NewSyscallError("socket", err)
}
if err = syscall.SetNonblock(s, true); err != nil {
poll.CloseFunc(s)
return -1, os.NewSyscallError("setnonblock", err)
}
return s, nil
}

Within sysSocket, the invoked socketFunc is essentially a socket system call. We’ll explore this in the code snippet that follows.

// net/hook_unix.go
var (
// Placeholders for socket system calls.
socketFunc func(int, int, int) (int, error) = syscall.Socket
connectFunc func(int, syscall.Sockaddr) error = syscall.Connect
listenFunc func(int, int) error = syscall.Listen
getsockoptIntFunc func(int, int, int) (int, error) = syscall.GetsockoptInt
)

Once the socket is created, syscall.SetNoblcok is employed to switch the socket into non-blocking mode.

// src/syscall/exec_unix.go
func SetNonblock(fd int, nonblocking bool) (err error) {
flag, err := fcntl(fd, F_GETFL, 0)
if err != nil {
return err
}
if nonblocking {
flag |= O_NONBLOCK
} else {
flag &^= O_NONBLOCK
}
_, err = fcntl(fd, F_SETFL, flag)
return err
}

2.3 Binding and Listening

Next on our journey, we turn our attention to the listenStream function. Right off the bat, this function utilizes system calls-specifically bind and listen-to accomplish its tasks of the binding and listening.

// net/sock_posix.go
func (fd *netFD) listenStream(ctx context.Context, laddr sockaddr, backlog int, ctrlCtxFn func(context.Context, string, string, syscall.RawConn) error) error {
...
if err = syscall.Bind(fd.pfd.Sysfd, lsa); err != nil {
return os.NewSyscallError("bind", err)
}
if err = listenFunc(fd.pfd.Sysfd, backlog); err != nil {
return os.NewSyscallError("listen", err)
}
if err = fd.init(); err != nil {
return err
}
lsa, _ = syscall.Getsockname(fd.pfd.Sysfd)
fd.setAddr(fd.addrFunc()(lsa), nil)
return nil
}

Interestingly, listenFunc operates as a macro, pointing directly to the syscall.Listen system call.

// net/hook_unix.go
var (
socketFunc func(int, int, int) (int, error) = syscall.Socket
connectFunc func(int, syscall.Sockaddr) error = syscall.Connect
listenFunc func(int, int) error = syscall.Listen
getsockoptIntFunc func(int, int, int) (int, error) = syscall.GetsockoptInt
)

2.4 Epoll: Creation and Initialization

Moving on, we arrive at the fd.init line of code. After several function call expansions, the creation of the epoll object is executed. This step also adds the socket handle-which is currently in a listening state-to the epoll object for network event management.

Let’s delve into how this magic happens.

// src/internal/poll/fd_poll_runtime.go
func (pd *pollDesc) init(fd *FD) error {
serverInit.Do(runtime_pollServerInit)
ctx, errno := runtime_pollOpen(uintptr(fd.Sysfd))
if errno != 0 {
return errnoErr(syscall.Errno(errno))
}
pd.runtimeCtx = ctx
return nil
}

The serverInit.Do function is deployed to ensure that the function passed as a parameter only executes once. We won’t go into too much detail about this. The parameter, runtime_pollServerInit, is a call to the poll_runtime_pollServerInit function belonging to the runtime package. You can find its source code within runtime/netpoll.go.

// src/runtime/netpoll.go
//go:linkname poll_runtime_pollServerInit internal/poll.runtime_pollServerInit
func poll_runtime_pollServerInit() {
netpollGenericInit()
}

This function eventually triggers the execution of netpollGenericInit, where the epoll object is created.

// src/runtime/netpoll_epoll.go
func netpollinit() {
var errno uintptr
epfd, errno = syscall.EpollCreate1(syscall.EPOLL_CLOEXEC)
...
}

Next, we turn our attention to runtime_pollOpen. It accepts the file descriptor of the pre-listened socket as its parameter. Within this function, the descriptor is added to the epoll object.

// runtime/netpoll.go
//go:linkname poll_runtime_pollOpen internal/poll.runtime_pollOpen
func poll_runtime_pollOpen(fd uintptr) (*pollDesc, int) {
...
errno := netpollopen(fd, pd)
if errno != 0 {
pollcache.free(pd)
return nil, int(errno)
}
return pd, 0
}

// runtime/netpoll_epoll.go
func netpollopen(fd uintptr, pd *pollDesc) uintptr {
var ev syscall.EpollEvent
ev.Events = syscall.EPOLLIN | syscall.EPOLLOUT | syscall.EPOLLRDHUP | syscall.EPOLLET
*(**pollDesc)(unsafe.Pointer(&ev.Data)) = pd
return syscall.EpollCtl(epfd, syscall.EPOLL_CTL_ADD, int32(fd), &ev)
}

The Journey through Accept

Once the server has wrapped up the Listen process, the baton is passed to the Accept function. This function shoulders three main tasks:

  • It calls upon the accept system call to accept a connection
  • If no connections have yet to arrive, the current coroutine is blocked
  • When a new connection makes its appearance, it is added to epoll for management before the function returns.

With single-step debugging in Golang, we can observe that the function enters the Accept nestled within TCPListener.

// net/tcpsock.go
func (l *TCPListener) Accept() (Conn, error) {
if !l.ok() {
return nil, syscall.EINVAL
}
c, err := l.accept()
if err != nil {
return nil, &OpError{Op: "accept", Net: l.fd.net, Source: nil, Addr: l.fd.laddr, Err: err}
}
return c, nil
}

// net/tcpsock_posix.go
func (ln *TCPListener) accept() (*TCPConn, error) {
fd, err := ln.fd.accept()
if err != nil {
return nil, err
}
return newTCPConn(fd, ln.lc.KeepAlive, nil), nil

The three steps we mention above are all done in netFD of accept function.

// net/fd_unix.go
func (fd *netFD) accept() (netfd *netFD, err error) {
d, rsa, errcall, err := fd.pfd.Accept()
if err != nil {
if errcall != "" {
err = wrapSyscallError(errcall, err)
}
return nil, err
}

if netfd, err = newFD(d, fd.family, fd.sotype, fd.net); err != nil {
poll.CloseFunc(d)
return nil, err
}
if err = netfd.init(); err != nil {
netfd.Close()
return nil, err
}
...
}

Next up, let’s dissect each of these steps in detail.

3.1 Welcoming a Connection

Through single-step tracking, we find that Accept finds its way into the Accept method under the FD object. Here, the accept system call of the operating system is invoked.

// internal/poll/fd_unix.go
func (fd *FD) Accept() (int, syscall.Sockaddr, string, error) {
for {
s, rsa, errcall, err := accept(fd.Sysfd)
if err nil {
return s, rsa, "", err
}
switch err {
case syscall.EINTR:
continue
case syscall.EAGAIN:
if fd.pd.pollable() {
if err = fd.pd.waitRead(fd.isFile); err nil {
continue
}
}
case syscall.ECONNABORTED:
// This means that a socket on the listen
// queue was closed before we Accept()ed it;
// it's a silly error, so try again.
continue
}
return -1, nil, errcall, err
}
...
}
}

The internal workings of the accept method trigger the accept system call of the Linux operating system. We won’t delve too deeply into this process. The intent behind calling accept is to obtain a connection from the client. If successful, it’s returned.

3.2 Parking the Current Coroutine

Now, let’s consider a scenario where no connection requests from the client have arrived at the time of the accept call.
In such an instance, the accept system call would return syscall.EAGAIN. Golang’s response to this state is to block the current coroutine. The crucial piece of code lies here.

// internal/poll/fd_poll_runtime.go
func (pd *pollDesc) waitRead(isFile bool) error {
return pd.wait('r', isFile)
}

func (pd *pollDesc) wait(mode int, isFile bool) error {
if pd.runtimeCtx 0 {
return errors.New("waiting for unsupported file type")
}
res := runtime_pollWait(pd.runtimeCtx, mode)
return converter(res, isFile)
}

The source code of runtime_pollWait is tucked away under runtime/netpoll.go, where go park-responsible for coroutine blocking-is executed.

// runtime/netpoll.go
//go:linkname poll_runtime_pollWait internal/poll.runtime_pollWait
func poll_runtime_pollWait(pd *pollDesc, mode int) int {
...
for !netpollblock(pd, int32(mode), false) {
errcode = netpollcheckerr(pd, int32(mode))
if errcode != pollNoError {
return errcode
}
// Can happen if timeout has fired and unblocked us,
// but before we had a chance to run, timeout has been reset.
// Pretend it has not happened and retry.
}
return pollNoError
}

func netpollblock(pd *pollDesc, mode int32, waitio bool) bool {
...
// need to recheck error states after setting gpp to pdWait
// this is necessary because runtime_pollUnblock/runtime_pollSetDeadline/deadlineimpl
// do the opposite: store to closing/rd/wd, publishInfo, load of rg/wg
if waitio || netpollcheckerr(pd, mode) pollNoError {
gopark(netpollblockcommit, unsafe.Pointer(gpp), waitReasonIOWait, traceEvGoBlockNet, 5)
}
}

The gopark function serves as Golang’s internal gateway to coroutine blocking.

3.3 Adding the New Connection to epoll

Now, let’s consider a situation where the client connection has already arrived. In such a case, fd.pfd.Accept returns the freshly minted connection. This new connection is promptly added to epoll for efficient event management.

// net/fd_unix.go
func (fd *netFD) accept() (netfd *netFD, err error) {
d, rsa, errcall, err := fd.pfd.Accept()
if err != nil {
if errcall != "" {
err = wrapSyscallError(errcall, err)
}
return nil, err
}
if netfd, err = newFD(d, fd.family, fd.sotype, fd.net); err != nil {
poll.CloseFunc(d)
return nil, err
}
if err = netfd.init(); err != nil {
netfd.Close()
return nil, err
}
...
}

Let’s turn our attention to netfd.init.

// internal/poll/fd_poll_runtime.go
func (pd *pollDesc) init(fd *FD) error {
ctx, errno := runtime_pollOpen(uintptr(fd.Sysfd))
...
}

As we’ve covered in section 2.4, the runtime function runtime_pollOpen is tasked with adding the file handle to the epoll object.

// runtime/netpoll.go
func poll_runtime_pollOpen(fd uintptr) (*pollDesc, int) {
...

errno := netpollopen(fd, pd)
if errno != 0 {
pollcache.free(pd)
return nil, int(errno)
}
return pd, 0
}

// runtime/netpoll_epoll.go
func netpollopen(fd uintptr, pd *pollDesc) uintptr {
var ev syscall.EpollEvent
ev.Events = syscall.EPOLLIN | syscall.EPOLLOUT | syscall.EPOLLRDHUP | syscall.EPOLLET
*(**pollDesc)(unsafe.Pointer(&ev.Data)) = pd
return syscall.EpollCtl(epfd, syscall.EPOLL_CTL_ADD, int32(fd), &ev)
}

The Inner Workings of Read and Write

Once the connection is successfully accepted, all that remains is reading from and writing to this connection.

4.1 The Underpinning of Read

We’ll first delve into the general process of the Read function.
Taking a closer look at the code.

// net/net.go
func (c *conn) Read(b []byte) (int, error) {
...
n, err := c.fd.Read(b)
}

We find that the Read function enters the Read method within the FD object. Internally, this method makes use of the Read system call of fetch data. If the data hasn’t arrived yet; it puts itself on hold.

// internal/poll/fd_unix.go
func (fd *FD) Read(p []byte) (int, error) {
...
for {
n, err := ignoringEINTRIO(syscall.Read, fd.Sysfd, p)
if err != nil {
n = 0
if err syscall.EAGAIN && fd.pd.pollable() {
if err = fd.pd.waitRead(fd.isFile); err nil {
continue
}
}
}
err = fd.eofError(n, err)
return n, err
}
}

The mechanism through which waitRead blocks the current coroutine mirrors what we’ve previously discussed in section 3.2, so we won’t go into more detail here.

4.2 The Inner Machinations of Write

The Write function follows a similar trajectory to Read. Initially, it employs the Write system call to dispatch data. If the kernel’s send buffer is inadequate, it puts itself on hold, resuming the send operation once a writable event takes place. The entry point of its source code is situated at net/net.go

// net/net.go
func (c *conn) Write(b []byte) (int, error) {
...
n, err := c.fd.Write(b)
...
}
// internal/poll/fd_unix.go
func (fd *FD) Write(p []byte) (int, error) {
...
for {
max := len(p)
if fd.IsStream && max-nn > maxRW {
max = nn + maxRW
}
n, err := ignoringEINTRIO(syscall.Write, fd.Sysfd, p[nn:max])
if n > 0 {
nn += n
}
if nn len(p) {
return nn, err
}
if err syscall.EAGAIN && fd.pd.pollable() {
if err = fd.pd.waitWrite(fd.isFile); err nil {
continue
}
}
}
// internal/poll/fd_poll_runtime.go
func (pd *pollDesc) waitWrite(isFile bool) error {
return pd.wait('w', isFile)
}

The actions following pd.wait parallel the process we’ve outlined in section 3.2. It calls upon runtime_pollWait to block the current coroutine.

// runime/netpoll.go
func netpollready(toRun *gList, pd *pollDesc, mode int32) {
var rg, wg *g
if mode == 'r' || mode == 'r'+'w' {
rg = netpollunblock(pd, 'r', true)
}
if mode == 'w' || mode == 'r'+'w' {
wg = netpollunblock(pd, 'w', true)
}
if rg != nil {
toRun.push(rg)
}
if wg != nil {
toRun.push(wg)
}
}

In Conclusion: The Golang Network Paradigm

The allure of synchronous coding lies in its alignment with our linear thought processes. It offers the simplicity of writing and the ease of understanding. However, this model’s Achilles’s heel is its abysmal performance, primarily stemming from the incessant thread context switches.

Enter poll-Linux’s mainstay for network program operation. Presently, the popular network framework models in various languages are based on epoll. The differentiation among them arises from their distinct epoll utilization methods. Mainstream asynchronous non-blocking models, though significantly boosting performance, are marred by their callback-centric programming. This design diverges from our intuitive linear thought process, rendering the resultant code somewhat cryptic.

Golang, in its ingenuity, heralds a refreshing networking programming paradigm. On the surface, it might still wear the synchronous cloak. However, delve deeper, and you’d see its mastery in fusing coroutines and epoll. This synergy sidesteps the high-performance toll exacted by thread switching. Instead of stalling user threads, it gracefully transitions to the lightweight coroutines.

--

--