Aeron — low latency transport protocol

I attended workshop with Martin Thompson about Aeron messaging. There I got a chance to clarify a few questions about this protocol. I think Aeron has a good documentation (wiki). But if you are interested in the details of implementation then only source code may help you. In this article I would like to make a brief introduction to Aeron.

So, Aeron is an OSI layer 4 Transport for message-oriented streams. It works over UDP or IPC, and supports both unicast and multicast. The main goal is to provide an efficient and reliable connection with a low and predictable latency. Aeron has Java, C++ and .NET clients.

When to use?

A high and predictable performance is a main advantage of Aeron, it’s most useful in application which requires low-latency, high throughput (e.g. sending large files) or both (akka remoting uses Aeron).

If it can work over UDP, why not to use UDP?

The main goal of Aeron is high performance. That is why it makes sense why it can work over UDP but doesn’t support TCP. But someone can ask what features Aeron provides on top of UDP?

Aeron is OSI Layer 4 (Transport) Services. It supports next features:
1. Connection Oriented Communication
2. Reliability
3. Flow Control
4. Congestion Avoidance/Control
5. Multiplexing

Architecture

Aeron uses unidirectional connections. If you need to send requests and receive responses, you should use two connections.

Publisher and Media Driver (see later) are used to send a message, Subscriber and Media Driver — to receive. Client talks to Media Driver via shared memory.

Media Driver

Media Driver handles all media-specific logic (UDP, IPC, InfiniBand, etc), therefore, a client does not worry about a protocol which is used between clients.

Media Driver can be run as a separate process or as a thread inside a parent process. The difference can be illustrated on the JVM example. If a client process is stopped due to GC pause, then it will not affect the Media Driver which runs as a separate process. The Media Driver will continue to handle messages. However, when Media Driver is embedded into the client process, it also experiences GC pause as a parent process.

Media Driver is implemented in Java, but C version was released recently. Media Driver has a set of parameters which are related to performance. For example, one of them is “IdleStrategy”, which determines behaviour of Media Driver when there are no messages. If it is idle, OS may schedule another processes to run on CPU core which was used by Media Driver. In this case CPU cache is replaced by the data of another process, and when a new message arrives, Media Driver cannot find its data in the cache, leading to the latency. To prevent this situation Media Driver can use a busy spin (BusySpinIdleStrategy). This strategy loads CPU core when there are no more messages, to prevent OS from scheduling another process on this CPU core.

If you want to test Aeron with the lowest latency, you may start from LowLatencyMediaDriver.

Design principals

  • Garbage-free in a steady state running
  • Applies Smart Batching in the message path
  • Lock-free algorithms in the message path
  • Non-blocking IO in the message path
  • No exceptional cases in the message path
  • Apply the Single Writer Principle
  • Prefers unshared state
  • Avoids unnecessary data copies

Latency-test (IPC)

git clone https://github.com/real-logic/aeron.git
cd aeron
./gradlew
cd aeron-samples/scripts
./embedded-ping-pong

Output (microseconds):

...
#[Mean = 9.027, StdDeviation = 1.251]
#[Max = 110.847, Total count = 1000000]
#[Buckets = 24, SubBuckets = 2048]

The folder aeron-samples/scripts contains different tests to measure latency and throughput in different modes. It also contains useful utils for debuging, here are some of them:

  • aeron-stat — metrics monitor;
  • error-stat — errors monitor (for performance optimization aeron doesn’t use common logging libraries)

Latency test inside docker

In the previouse section, we discussed an example when aeron clients send messages inside the same OS (IPC). Let’s review an example with a docker.

docker-compose.yml:

version: '3'services:  
aeron_ping:
image: "centos_aeron_base:latest"
command: > bash -c
"java -cp samples.jar io.aeron.samples.LowLatencyMediaDriver \
& java -cp samples.jar \
-XX:+UnlockDiagnosticVMOptions \
-XX:GuaranteedSafepointInterval=300000 \
-Dagrona.disable.bounds.checks=true \
-Daeron.sample.ping.channel=aeron:udp?endpoint=aeron_pong:40124 \ -Daeron.sample.pong.channel=aeron:udp?endpoint=aeron_ping:40123 \ io.aeron.samples.Ping"
shm_size: 512M
  aeron_pong:    
image: "centos_aeron_base:latest"
command: > bash -c "java -cp samples.jar \ io.aeron.samples.LowLatencyMediaDriver & java -cp samples.jar \ -XX:+UnlockDiagnosticVMOptions \ -XX:GuaranteedSafepointInterval=300000 \ -Dagrona.disable.bounds.checks=true \ -Daeron.sample.ping.channel=aeron:udp?endpoint=aeron_pong:40124 \ -Daeron.sample.pong.channel=aeron:udp?endpoint=aeron_ping:40123 \ io.aeron.samples.Pong"
shm_size: 512M

While containers are connecting only to each other and don’t expose any ports to the host, we don’t need to set ports and networks. shm_size parameter sets the size of shared memory for container, and by default it is quite small in the docker. Since, aeron uses the shared memory to exchange information between client and media driver, we should increase this paramater.

Output (microseconds):

aeron_ping_1 | #[Mean  =      15.690, StdDeviation   =        2.004]
aeron_ping_1 | #[Max = 1290.239, Total count = 1000000]
aeron_ping_1 | #[Buckets = 24, SubBuckets = 2048]

Aeron vs other alternatives

A big advantage of Aeron is support of both UDP and IPC. Thereby, we have a flexibility when we plan a deployment. We can split services between different nodes and communicate over UDP. But if we need a faster communication, we can run services on the same box and talk via IPC.

It is interesting what kind of overhead Aeron has comparing to java build-in tools. I compared a ping-pong round trip latency between two threads using java locks (ConditionVariablesPingPong.java) and using Aeron with embedded media driver (EmbeddedPingPong.java), and noticed that the performance was similar.

//microseconds
ConditionVariablesPingPong.java:
#[Mean = 7.148, StdDeviation = 2.733]
#[Max = 1842.175, Total count = 1000000]
#[Buckets = 13, SubBuckets = 2048]
EmbeddedPingPong.java:
#[Mean = 7.978, StdDeviation = 0.993]
#[Max = 147.839, Total count = 1000000]
#[Buckets = 24, SubBuckets = 2048]

Another interesting aeron benchmarks could be found on https://github.com/benalexau/rpc-bench/tree/master/results/20161024:

Where is Aeron going?

  • Clustering (2017);
  • Security;

All tests were done with: Aeron 1.4.1-SNAPSHOT, java HotSpot(TM) 64-Bit Server VM (build 25.144-b01), Fedora 26 4.12.13–300.fc26.x86_64, CPU Ryzen 1600.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.