Netty data model, threading, and gotchas

Ammar Khaku
9 min readAug 15, 2021

--

Photo by Christopher Gower on Unsplash

Before we begin, note that this article incorporates lessons learned from Netty in Action. The book is a fairly useful read, although it could benefit from a second edition to correct some errata. You’ll also find that this article is broken up into two sections; the first sets the context and introduces the data model, while the second talks about how Netty is actually used and potential mistakes to avoid. Depending on what you’re looking for you may choose to skip one of the sections.

Introduction

In the early days of Java, the only way to do network programming was using the Socket API and doing blocking I/O. This meant that you needed to spin up a thread per connection to handle concurrent connections since I/O could block at any time. That works well for small numbers of connections/threads, but eventually context switching and the overhead of creating all those threads catches up to you, and so a blocking model isn’t particularly performant for large numbers of connections. Java eventually added non-blocking I/O (NIO) which allows you to create a thread pool and handle a large number of connections. The API is however a lot more complicated than the old, blocking I/O API (OIO) and it can be fairly difficult to implement performant network operations. This is where Netty comes in.

At its core, Netty is a Java library that facilitates network operations. It supports both blocking and non-blocking I/O, connection-oriented protocols such as TCP as well as connectionless protocols such as UDP, and managing data transfer on both the client and the server. In fact, it abstracts away all those lower-level details and allows users to focus on their business logic. The API is event-driven; since it’s a network programming library, the core “events” have to do with bytes being sent and received, but Netty also supports user-defined events to execute custom logic. The API is also fully asynchronous, even when it is configured to use OIO under the hood. Netty makes liberal use of what it calls a ChannelFuture, which is an extension of Java’s Future that allows registering callbacks. It also uses Promise objects, which are Futures that can be written to — very similar to Java 1.8’s CompletableFuture.

In this article we will be going over the data model, an example pipeline, the threading model and its implications.

Data model

Netty’s data model is fairly straightforward: the core type is a Channel, which has its own ChannelPipeline and is associated with a single EventLoop from an EventLoopGroup.

Channel
A Netty Channel is a vehicle for inbound/outbound data: the same concept as a Java NIO Channel. For example, for TCP it represents a connection to a remote host, and for UDP either an address for outbound datagrams or a listener for inbound datagrams on a local port. Every Channel has its own ChannelPipeline: more on that later. Events are fired on the channel, for example when a channel is registered/deregistered, when it is active/inactive, or when bytes are received or sent.

Channels can use different transports, for example OIO, NIO[1], EPoll (Linux), KQueue (BSD/MacOS), Local (within the same JVM), or Embedded (often used for integration testing). The implementation of Channel you use is decided by the transport and socket type, for example NioServerSocketChannel (NIO transport, server socket) vs NioSocketChannel (NIO transport, client socket) vs EPollDatagramChannel (EPoll transport, UDP socket). Not all channel implementations have the same featureset, for instance NIO and EPoll/KQueue support what Netty calls zero-byte copy[2] while others do not.

ChannelPipeline
Every Channel contains its own single ChannelPipeline. A ChannelPipeline is a list of ChannelHandler, each of which can be a ChannelInboundHandler (handles incoming events), ChannelOutboundHandler (handles outgoing events), or both.

These ChannelHandlers are the workhorse of a Netty-based server or driver: they contain the business logic of the application or client. They function very much like Unix pipes: events enter one end of the pipeline, are processed by a series of handlers, and then exit through the other end of the pipeline unless they were dropped. A handler must explicitly fire the next handler with the relevant payload, otherwise execution flow will not continue on, and processing will stop there. The very first inbound handler takes in a ByteBuf (see below) from the socket, and the very last outbound handler produces a ByteBuf to write to the socket. A handler that decodes bytes into a message often extends the Netty-provided ByteToMessageDecoder, and a handler that encodes a message into bytes often extends MessageToByteEncoder. Netty also provides MessageToMessageDecoder/Encoder/Codec to simplify writing these common types of handlers.

Note that ChannelPipelines may be modified by the handlers themselves. For example, an application implementing STARTTLS may remove the handlers that implement the TLS handshake after the connection is upgraded to use TLS. We’ll go over another example of a dynamically-modified ChannelPipeline later in this article.

ByteBuf is a Netty-specific version of an NIO ByteBuffer. Netty implements its own buffers in order to simplify the API (e.g. no need to flip between “read” and “write” modes) and to add functionality (e.g. “zero-copy”). A ByteBuf can represent JVM heap memory, native memory, or even a composition of the two. Pooling and reference-counting are available for performance, and understanding how they work are key to writing performant applications with Netty. In this article we’re focused on debugging and extending code using Netty so we won’t look at them too closely.

EventLoopGroup
This is a container for EventLoop objects. Each EventLoop object is exclusively associated with a single Thread. Each Channel is associated with a single EventLoop throughout its lifetime.

While each Channel is associated with a single EventLoop, an EventLoop may end up being associated with multiple Channels. This is an important point, and we’ll get into the implications later on.

There are several implementations for EventLoopGroup, and the particular implementation used must be matched up with the transport: for example, NIOEventLoopGroup must be used with the NIO transports, OIOEventLoopGroup with OIO transports, etc. The number of EventLoop objects created in a single group depends on the implementation of the EventLoopGroup: OIOEventLoopGroup creates a new EventLoop for every new Channel, while the NIO/EPoll/Kqueue groups create a pool of 2 * number of processors and distribute them evenly across Channels.

All events and handlers for a Channel are executed on its single EventLoop. These EventLoop objects can be thought of as I/O threads since they handle all the I/O in a Netty application or driver, including any processing that happens in the ChannelHandlers in the ChannelPipeline.

Using Netty

Netty is most often used in TCP-based client-server contexts, and so the discussion below assumes as such.

Servers vs Clients

The server code uses a ServerBootstrap object to initialize Netty. It is created with a single server Channel, which spawns child channels to handle each client. When creating the server Channel, you must provide two EventLoopGroup objects: one for the main server Channel, and another that is used for the child channels. After fully specifying the ServerBootstrap, you bind it to a port and wait for connections.

Note that while you can technically use the same EventLoopGroup for both the server and child channels, that’s probably a bad idea since you will likely end up sharing a single EventLoop between the server and one of the client Channels, and that client Channel may end up blocking the server Channel from using the EventLoop and so prevent the server from accepting connections.

Similarly to servers, client Channels are set up using a Bootstrap object. This is configured with a single EventLoopGroup, all ChannelHandlers, etc. You then connect the Channel to a remote endpoint, after which you can write to the Channel and receive data using the attached ChannelHandlers.

ChannelPipeline example

As mentioned earlier, the ChannelPipeline and its associated ChannelHandler chain are the guts of a Netty-based application or driver. Netty provides ChannelHandler implementations to simplify development, for example to handle TLS, to encode/decode HTTP, and to implement the WebSocket protocol. For example, the diagram below illustrates a simplified version of a Netty pipeline to implement the WebSocket protocol on a server.

The top row of boxes are ChannelInboundHandlers that handle requests coming in from a client. The bottom row is a ChannelOutboundHandler that dispatches responses to the client. All ChannelHandlers depicted are provided by Netty, so all a developer needs to do is string them together to implement the WebSocket protocol. This batteries-included approach is a big part of how Netty makes building performant and highly-concurrent server components accessible.

One more interesting bit with the WebSocket protocol is that the protocol includes an “upgrade” from HTTP to WebSockets. At that point some of the HTTP-specific components are no longer needed. Netty’s WebSocket ChannelHandlers respond to the upgrade by modifying the ChannelPipeline on the fly:

The simplified diagram above shows how the ChannelHandlers cleaned up their own ChannelPipeline by removing components that are no longer needed. Rewriting your own pipeline is a pretty powerful feature although it can make things a little difficult to follow.

For an in-depth example setting up a ChannelPipeline, see how the Datastax Java Driver for Apache Cassandra® sets up their ChannelPipeline.

Threading concerns

As we learned earlier, each Channel is assigned an EventLoop, which corresponds to a single I/O Thread. With the exception of OIO (which creates a new EventLoop for every Channel), EventLoops are assigned evenly across Channels from the general EventLoopGroup pool.

This simple threading model means that you do not need to worry about concurrency issues in the execution of your ChannelHandlers. You are always guaranteed sequential execution on the same thread for a single run through your pipeline. Furthermore, since you don’t create large numbers of threads (by default 2 * number of processors) your CPU isn’t overburdened by context switching.

On the other hand, you need to be careful to not create multiple EventLoopGroups since each of those creates its own thread pool. The exception here is when specifying your server event loop group vs child channel event loop group — you don’t want the thread used to accept connections to also be used to process them, since if it gets tied up your server will bottleneck at accepting connections.

It’s also very important to note that since EventLoops (and so Threads) are shared across multiple Channels, a single slow Channel can slow down multiple requests, especially as the number of connections/Channels grows much larger than the size of the EventLoopGroup thread pool. This is the probably the single biggest threading-related gotcha here so it bears repeating: any intensive processing should be offloaded to a separate thread and you should never be sleeping in your ChannelHandlers, since if you lock up one of your EventLoops, any other requests/Channels that happen to be assigned the same EventLoop will be stuck waiting for the EventLoop to free up. For an example of where this can cause bottlenecks, see CASSANDRA-15013 and the associated blog post.

Takeaways

If you skipped to the end or if you just need a refresher, here are the most important takeaways:

  • Channel is the main container, it contains a ChannelPipeline and is associated with an EventLoop (a container for a Thread) from an EventLoopGroup.
  • ChannelPipeline contains a chain of ChannelInboundHandler (handles inbound messages) and ChannelOutboundHandler (handles outbound messages) which contain the business logic.
  • EventLoop is essentially an I/O thread and may be shared by multiple Channels. ChannelHandlers are executed on these EventLoop threads.
  • Server and Client initialization is similar, except that a ServerChannel handles accepting connections and creates child channels to service the requests.
  • Don’t use the same EventLoopGroup for your server Channel and for your child Channels.
  • Netty comes with a ton of pre-built handlers (e.g. for TLS) that you can use in your application.
  • If ChannelHandlers block or are slow they will hinder processing of requests on Channels that happen to use the same EventLoop.

Footnotes
[1] The NIO JDK on Linux actually uses EPoll under the hood, while the JDK on MacOS uses KQueue. However, Netty’s EPoll implementation is a little bit more performant than the JDK’s NIO implementation since it uses edge-triggered interrupts vs the JDK’s level-triggered.
[2] This allows for data copy from a file to a network socket without buffering it into memory at the user-space level, see also this Stackoverflow question.

--

--