Intro to Distributed Elixir
How Elixir and the BEAM can make distributed systems easier
Some time ago, your overly excited co-worker came around talking about Elixir and the BEAM, a platform that made hard things easy. You signed up for all the things. You’re using pattern matching to make your code more readable. You’re using pipelines and tokens to make composable functions a snap. You’re using Supervisor trees to make your app more fault-tolerant and self-healing. You’re even doing hot code upgrades! You’re living the dream.
Something is nagging you though. This Distributed Erlang thing they kept mentioning… It sounds cool, but it felt… unnecessary. In this post, we’ll look how using distributed Elixir can supercharge your applications for FREE.
What is distribution?
Said simply, distribution is running an application across more than one server. Why would you do this? Why deal with the added complexity? Splitting your application makes it easier to scale a specific part of your offering, makes the code easier to reason about since you have smaller code bases, and lets you handle situations of total server obliteration. Failure in a distributed system looks like a hard drive failure, or an entire region being unavailable.
Distribution in Elixir
Elixir and the BEAM are built for scenarios where failure is normal. Supervisors and their restart strategies are first class citizens. Distribution is also a first class citizen and can be used without additional plugins, and very little configuration. In addition, by connecting additional Elixir nodes, you get a ton of feature upgrades for free.
In this great talk by Sophie Debenedetto, she covers how to set up Presence. If you are running multiple connected Phoenix servers though, you get automatic syncing of Presence data for free! No external dependencies, no nothing.
OTP gives you a relational database out of the box called Mnesia. If you have multiple connected applications, your Mnesia database will be automatically distributed and replicated across your cluster! AMAZING.
Plain old Elixir Apps
Connecting Elixir applications lets you send messages to pids across applications as if they were locally created. This means you can start writing an app to run on one server, then add distribution without having to change much code! SO GREAT.
How does it work?
Named nodes can communicate with each other with a shared secret. The simple example is to open 2 terminals like so:
# terminal 1
iex --name email@example.com --cookie 'super_secret'# terminal 2
iex --name firstname.lastname@example.org --cookie 'super_secret'# terminal 1’s iex Session
iex(email@example.com)1> Node.connect(:"firstname.lastname@example.org")trueiex(email@example.com)2> Node.list[:"firstname.lastname@example.org"]iex(email@example.com)3> Node.spawn(:"firstname.lastname@example.org", fn ->...(email@example.com)3> IO.inspect Node.self()...(firstname.lastname@example.org)3> end):"email@example.com"
So what’s going on here? We’re starting two sessions with that shared cookie of
super_secret. We’re then connecting the nodes with each other by using
Node.connect/1. We need to use that funky syntax since the input has to be an atom. After they’re connected, you can run functions on any connected node. Here we’re calling
IO.inspect Node.self() on my other terminal. It could very well be another machine!
One thing to note is that once a node is connected, any new nodes it learns about will be propagated to the cluster. Say you have terminal 1, 2 and 3. If 1 connects to 2, and 2 connects to 3, 1 will know about 3 and vice-versa. Just another one of those things the BEAM makes easy.
Libraries for clustering
Setting up clustering is pretty easy, but we can do better with
libcluster gives you different strategies for discovering and clustering your applications. It’s very useful if you’re deploying to Kubernetes.
Another library to look at is
partisan is there if you need better performance out of your distributed system. It’s an alternative to the built in clustering support built into the BEAM.
What can go wrong?
The BEAM runs a process (
epmd) that runs on port 4369 that is the first stop for your node when it’s trying to connect to a sibling. If this port is closed, you’re not getting distribution going. Make sure that port is accessible from your other servers. Some platforms make this a bit harder, but ask your local networking guru for help.
This is a tough one. Networks are inherently unreliable. You may not notice it but network communication is riddled with dropped packets, timeouts, and malformed responses. Now that we have these clustered apps, what happens when they lose sight of each other? Worse yet, what happens when multiple servers take in input they thing is “true”? I put a bid in on server 1, but you put a bid in on server 2? In a perfect world, our distributed setup would handle this, but what if server 1 and 2 are disconnected? Who should server 3 believe?!
Distribute all the things
Elixir and the BEAM make hard things easy. Next time you’re building an app, ask yourself, “How can distribution make this better?”
Thanks for reading! Want to work on a mission-driven team that loves Elixir and making hard things easy? We’re hiring!