BEAM/OTP on OCP, a Partnership for Reliable and Resilient Services.

Francis Lee
AI+ Enterprise Engineering
7 min readApr 30, 2021

This is part one of a multi-part tutorial/article where I develop a distributed system solution using Erlang/Elixir on BEAM/OTP running off OpenShift Container Platform (OCP). In this first article, I extoll the rationalization of why I chose BEAM languages, specifically Erlang, and marrying BEAM/OTP with OCP to provide a consistent, scalable, and reliable system. There have been tests done by British Telecom that reflected systems built on the Erlang/OTP platform exhibited 99.9999999% Nine-nines of reliability, but YMMV. This is the kind of reliability I want to build into my systems.

Who uses Erlang/OTP?

Erlang/OTP is usually used as a platform when you want high connectivity, massive concurrent processing and reliability/resiliency built in. WhatsApp, WeChat, Whisper, UK’s National Health System (NHS), AdRoll, Nintendo Switch messaging system, etc are but a few companies that rely on Erlang’s massive concurrency abilities. WhatsApp manages 900M users with just 50 engineers because of Erlang/OTP’s reliability and resiliency. Goldman Sach uses Erlang in its hedge-fund trading platform for its low-latency (microseconds) event-driven order-submission engine. League-of-Legions, a gaming platform uses Erlang to manage 7.5 million users concurrently and Grindr, an online dating application manages 2000 messages/sec for its active 3.2 million users.

What is Erlang and BEAM/OTP?

Erlang is a programming platform created in Ericsson’s Computer Science Labs in the mid-’80s. Its inventors were faced with an uphill task of providing reliable software to power their telco communication systems. They recognized that as long as humans created software, there were bound to be bugs in the software. By recognizing that bugs were inevitable, they had a paradigm change in their thinking and solution; by creating a platform that manages errors so as to minimize service outages.

The key design principles for such a system needs to include:

  • Support a large number of concurrent activities (e.g. several hundred thousand activities per compute node). Each activity is isolated from others such that a failure in one activity does not affect other activities. No shared state among activities. Isolated concurrency means reliability.
  • With no shared state among concurrent activities means that there is no need for mutexes, locks and condition-variables that are usually found in other concurrent platforms; thus making the programming model very easy for developers.
  • Supports activity distribution out-of-the-box to run across multiple compute-nodes, from a resiliency and scalability point-of-view.
  • Supports supervision monitoring of all the activities with self-mitigative actions when faults occur. Ability to orchestrate, monitor and manage the activities running on local and remote compute nodes and provide a central management facility for all activities.
  • Tolerance for both hardware and software faults. The activities will still continue to run even when parts of the platform malfunction.
  • Continuous operations for years — software patching and maintenance do not need to bring the system down; support live updates and hot-patching in place while the system is still running.

Based on these requirements, the engineers created the Erlang programming language, a dynamically typed functional programming language that was based on Prolog, with immutable variables. To support the massive amount of activities per compute node, a lightweight Erlang process was used as the abstraction model for each concurrent activity. Note that the Erlang process is not the same as the OS process; Erlang processes execute in the user-space and are very quick to spawn, switch and terminate. They do not share memory and are isolated from each other. Under the hood, the Erlang process is actually a function call.

The Erlang programming model is in effect programming concurrent processes or better known in the development circles as concurrency-oriented programming. Erlang’s concurrency model is pre-emptive and each process is isolated (memory-wise) and has its own Garbage-Collection (unlike Java) so there is no issue with potentially freezing the entire world when Garbage Collection occurs.

Since each Erlang process is isolated from each other, and there is no shared state between processes, the Erlang engineers devised a mechanism that facilitates the sending of messages between processes. Every Erlang process includes its own mailbox. Erlang’s processes can send messages to other processes within the same compute node and to processes that are distributed across multiple Erlang compute nodes. Erlang’s inventors later found out that what they had developed was known as the Actor-Model. This distributed communication capability across clustered compute nodes is out-of-the-box with Erlang without the need for service-meshes of today. The Erlang platform is a distributed computing platform.

To implement the ability of hot-patching and the hosting of massive concurrent processes, the Ericsson engineers fabricated a virtual-machine construct, similar to Java’s Virtual Machine. The Erlang VM is called BEAM and its responsibilities include managing and scheduling the process execution. It is designed to be able to manage 134 million concurrent processes on a single compute node. The development process of Erlang is very similar to other programming platforms; source codes are written in text files and are then compiled to the BEAM’s byte-code. The BEAM will execute the byte-codes (again, very similar to how Java source codes are compiled to JVM byte-codes).

The Erlang developers soon discovered that were coding patterns that occur very often when developing software for the communications systems. Similar to how game developers discover the common patterns in their coding and packaged these patterns into a Game-Engine, the Erlang developers packaged their common Erlang patterns and libraries together as the Open Telecom Platform (OTP) framework. While the name implies that the OTP has something to do with Telecommunications, the packaged patterns and libraries are actually geared towards the general creation of service-oriented systems.

BEAM and OTP are bundled together to form the Erlang Run-Time System (ERTS). This is a software distribution that is used to be deployed on compute nodes. A networked group of compute nodes installed with the ERTS form the basis of an Erlang Cluster. BEAM/OTP includes all necessary software libraries for distributed computing, including local/remote processes management, message passing among processes and supervision and mitigation of process failures.

While the developers love what BEAM/OTP provides out-of-the-box, some developers find that the Prolog flavoured Erlang language too esoteric for their development style. Prolog being a declarative type of language isn’t suited for most people. One particular developer who was on the Ruby core development team decided to create a more imperative-styled (but still function-oriented) language that compiles to the BEAM byte-code and leverages OTP, and named it Elixir. This parallels the JVM world where we have Scala, Clojure and Kotlin languages, and not just Java. The languages that compile to run on BEAM are known as BEAM languages.

Why Erlang/BEAM/OTP for modern backend services?

While the Erlang platform was originally designed for telco communications equipment, it turns out that the platform’s key design principles (as stipulated above) are essentially required too in today’s modern backend servers. Web-Servers, Messaging Systems, Microservices, Queues, etc. all need to exhibit high-concurrency, low-latency transactions and to manage software and hardware faults thus enabling high-reliability. The challenges that faced the Ericsson engineers in the mid-’80s are also similar challenges that we face today in our cloud era of back-end servers, enabling services for a multitude of users. Erlang/Elixir isn’t merely just a programming language, but a programming platform to create very reliable services. The Erlang language together with its libraries and OTP is almost like an Operating System, with its own schedulers and concurrency model, millions of processes per physical machine, distribution across physical machines out-of-the-box and the development support for concurrent, distributed systems allowing ease of troubleshooting across remote systems; all these make developing in Erlang/Elixir/OTP a boon to the distributed systems developer who wants reliable and resilient services.

However, while Erlang’s BEAM orchestrates and manages processes in a distributed cluster of compute nodes, it doesn’t manage the provisioning and management of the compute nodes that form the cluster. This is where the OpenShift Container Platform (OCP) comes into the picture. By leveraging OCP’s capability to provide a consistent interface to provision and manage the containers/compute nodes to host Erlang/BEAM/OTP cluster, we get a seamless and efficient means to provision and manage the compute nodes across hybrid/multi-cloud services. Let OCP manage the infra-architecture while Erlang/BEAM/OTP manage the software architecture for reliable services.

What Erlang/BEAM/OTP is not designed for?

Erlang on BEAM/OTP was not designed for number-crunching endeavours, like using it to mine for Crypto-Currency or calculating the nth Pi number. While it supports pre-emptive concurrency and applications requiring embarrassingly parallel or reduction (map/reduce) algorithms, the BEAM virtual machine wasn’t tuned for such number-crunching activities. You are better off with using Python with its associated NumPy or Anaconda for data analytics.

Perhaps in such situations where you need a back-end service to do some data analytics, you might want to consider a hybrid approach. Use Erlang on BEAM/OTP for fronting the services and orchestrating the work-loads, and have Erlang/Elixir integrate with Python/NumPy/Anaconda for those specific number-crunching parts. This would be using the right tool for the right resolution.

Summary

In this article, I’ve put forth the rationalization of using Erlang/Elixir on BEAM/OTP software architecture, and why companies like WhatsApp, RabbitMQ, RiakDB, Payment Gateways, NoSQL DBs, etc use it for its inherent concurrency and reliability. Erlang is a functional language with concurrency as its core programming paradigm, and the immutability of variables promotes a safer development language and environment. Erlang’s Remote Shell monitoring and debugging make it easier to troubleshoot distributed system (especially in our microservices era). Implementing the Erlang Run-Time System (ERTS) on OCP marries the Erlang platform with a container PaaS that is consistent across on-prem and with the public CSPs.

In the next tutorial part, I will delve into installing the ERTS on OCP, and show how the Erlang Cluster can be easily created.

Further Readings/Videos:

--

--