Scaling Erlang Distribution
One technology used by many partners in the LightKone project is Erlang and its standard library OTP, because of its excellent properties when it comes to developing distributed systems all the way down to the light edge. Even though Erlang/OTP has built-in distribution mechanics, those doesn’t always scale to the number of nodes required to build large scale edge computing solutions. In the LightKone project we therefore try to enhance and experiment with the Erlang/OTP distribution functionality on order to meet our requirements. The enhancements we are proposing are transparent to Erlang systems and will thus be useful to the Erlang/OTP ecosystem as whole and not only edge computing projects.
Stritzinger GmbH has been researching Erlang distribution and proposing several different changes that will make it more useful for edge computing. We are looking at changes in three different categories: transport protocols, internal routing and developer efficiency.
In the transport protocol area we initially created a prototype that solved the problem of head-of-line blocking. Previously in Erlang/OTP, when large messages where sent, they occupied the single transport session that was used between nodes. Thus, a large message could block smaller important messages such as node keep-alive messages. In the most extreme cases, a node could be reported as down even though a large message was actively sent because it blocked the smaller keep-alive message. Our prototype for solving head-of-line blocking showed great performance improvements when it came to message latency. The solution was to introduce message fragmentation and sequence numbers and interleave fragments from small and large messages. This change was later introduced into Erlang/OTP 22.0 as a proper native implementation.
Another experiment we made was to replace the default Erlang/OTP distribution transport of TCP with UDP. This showed that it is possible (with the addition of message fragmentation and sequence numbering from above) to distribute Erlang/OTP over UDP. This can have latency and throughput benefits. It can also be useful in the context of industrial networking where in combination with Time Sensitive Networking (TSN) it can enable creating distributed real-time applications with Erlang/OTP.
In order to scale Erlang/OTP distribution it is important to move beyond the fully connected mesh established by the default distribution implementation. Every new node that joins a cluster connects to every existing node. This creates a quadratically increasing number of connection and a similar increase in keep-alive messages throughout the cluster. This creates scaling problems when scaling beyond 40–60 nodes in a cluster. Traditionally this can be bypassed by crated so-called hidden Erlang nodes, that do not propagate node membership gossip. This however is an application layer solution that requires the developer to actively be aware of it when creating a distributed architecture. We propose separating node membership from the connection layer, by enabling routing of messages between nodes in a larger cluster. Combined with existing routing algorithms such as IS-IS this could enable scaling clusters well beyond the established limits.
Finally, we have also created a common behavior to use when developing new alternative distribution implementations. This behavior reduces some of the boiler-plate code needed to start new implementations by encapsulating it in a reusable module.
As part of the LightKone project we at Stritzinger GmbH continue to find new ways of enhancing Erlang/OTP distribution. We want to make these changes as generic and open as possible so that they can be used within the wider Erlang/OTP ecosystem.
To read more about this topic, please see our full paper published in ICPF 19’s Erlang Workshop.
By Adam Lindberg, Peer Stritzinger GmbH