A Complete Guide to Deploying Elixir & Phoenix Applications on Kubernetes — Part 5: Clustering Elixir & Phoenix Channels

At Polyscribe, we use Elixir and Phoenix for our real-time collaboration and GraphQL API backends and Kubernetes for our deployment infrastructure. In this series, I walk through the setup we used from start to finish to create a system that supports the following:

  • Automatic clustering for Elixir and Phoenix channels
  • Auto-scaling to respond to spikes in demand
  • Service discovery for microservices, including those in other frameworks like Node.js
  • Maintaining the exact same environment between staging and production and easily deploying from staging to production
  • Relatively easy to setup and manage

Other posts in this series — Part 1: Setting up Distillery, Part 2: Docker and Minikube, Part 3: Deploying to Kubernetes, Part 4: Secret Management


One of the draws of Elixir and the underlying Elixir VM is the powerful tools it provides for communicating between nodes on the network (often called Distributed Elixir or Distributed Erlang). Once Elixir nodes know about each other, it’s dead simple to send messages from a process on one node to a process on a different node. Phoenix builds upon this infrastructure to provide “channels” — a set of tools such as pubsub and broadcasting to allow us to easily build soft real-time features into our apps.

Before we can take advantage of these tools, we need to enable our nodes to discover the other nodes to form the cluster. In a traditional deployment, we would hard-code the IP addresses of the other nodes into each node and the Elixir VM would take care of the rest. However, in our Kubernetes setup, nodes can be destroyed and replaced at any time, so their IP addresses are changing. Kubernetes also lets us easily scale our deployment up or down to handle changes in traffic, so the number of nodes might change as well. As new nodes get added or removed by Kubernetes, we’d like them to be automatically added or removed from our cluster.

To do this, we’re going to use DNS-based discovery:

  1. We’ll use Kubernetes DNS to assign a single name to all of our Elixir application’s pods.
  2. Using the peerage package, on each node we’ll resolve the DNS name to enumerate the IP addresses of all the pods and connect to each of the other pods.
  3. Peerage will regularly perform this DNS lookup and update the cluster membership as new nodes are added or deleted.

And that’s it — we’ll have automatic clustering of our Elixir application that also gracefully handles changes in cluster size.

We start by creating a vm.args file. Distillery uses this file to provide additional run-time configuration for the Elixir VM. In our case, we want to configure the VM to prepare it for clustered/distributed operation. In the root of your application, create a file called vm.args with the following contents:

-name myapp@${MY_POD_IP}
-setcookie mymagiccookie

Every node that wants to connect to other nodes must have a name. The first line sets the name of the node to myapp@<OUR POD'S IP>. We’re using Distillery’s template variable replacement to replace the MY_POD_IP template string with the value from the environment variable (which we’ll be creating shortly) just as we did in the config files in Part 1. The next line sets a “magic cookie” — the Elixir VM only allows nodes with the same magic cookie to connect to each other.

We need to tell Distillery to use our custom vm.args file, so modify your rel/config.exs file as below:

[...clipped...]
release :myapp do
[...clipped...]
set vm_args: "./vm.args"
end

Next, let’s make Kubernetes set the MY_POD_IP environment variable for us. Edit the k8s/myapp-deployment.yaml file and add the lines in bold below:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp-deployment
spec:
replicas: 2
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
[...clipped...]
env:
- name: HOST
value: "example.com"
[...clipped...]
- name: DB_HOSTNAME
value: "10.0.2.2"
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP

We’re using the Kubernetes Downward API to expose the IP that Kubernetes has given this pod as the environment variable we used in our vm.args file.

Now that the VM is setup for distributed Elixir, we need to ask Kubernetes to give us a DNS name we can use to discover the IP addresses of all our Elixir application’s pods and configure peerage to use that name. The process for giving our application a DNS name is similar to the process we used to create the load balancer in Part 3: we create a service. However, since our previous service was a load balancer, it only returns one IP address, whereas in this case we want the IP addresses of all the pods. In Kubernetes, this is called a “headless” service.

Create a file called k8s/myapp-service-headless.yaml with the following contents:

apiVersion: v1
kind: Service
metadata:
name: myapp-service-headless
spec:
ports:
- port: 8000
selector:
app: myapp
clusterIP: None

This is very similar to our load balancing service except we’ve set clusterIP: None, which tells Kubernetes that this is a headless service (ie. we don’t want a single cluster IP address to address these pods, we want the pod IPs themselves). One oddity is that we’ve specified a port — this is a workaround for a Kubernetes bug which won’t allow the service to resolve without a port. Tell Kubernetes to create this service by executing kubectl create -f k8s/myapp-service-headless.yaml.

Finally, let’s install and configure peerage. Follow the instructions in the peerage readme to install the peerage package, and then add the following configuration to config/prod.exs:

config :peerage, via: Peerage.Via.Dns,
dns_name: "myapp-service-headless.default.svc.cluster.local",
app_name: "myapp",
interval: 5

This is fairly straightforward — we tell peerage to use DNS for clustering. The dns_name is Kubernetes’ fully-qualified version of the name (including the namespace “default”) we gave our service. The app name is the name we gave our node in the vm.args file. We also tell peerage to check every 5 seconds for changes in cluster membership.

Let’s test to make sure everything is working as expected:

  1. Create a new release of your application and deploy it to your cluster as described in Part 3.
  2. Get the list of pods by executing kubectl get pods. Select one of the pods and copy the name. We can access a shell on the pod by executing kubectl exec -it <pod name> -c myapp -- /bin/bash.
  3. Let’s connect to our running Elixir application on this pod. From within the shell on the pod, run ./bin/myapp remote_console. This gives us an iex prompt that’s connected directly to our live Elixir application.
  4. To see the list of other nodes that we’re connected to, type Node.list(). You should see a list with one element since we specified 2 replicas in our deployment file. This indicates we’re clustering successfully and our nodes can communicate with each other using the standard Elixir functions.

A final note: by default, Phoenix uses Distributed Elixir (aka the clustering we just set up) as its channels back-end. If this is a new project, you don’t need to do anything else to use channels — they’ll just work. However, if you were using a different deployment environment such as Heroku, you may have been using the Redis back-end for channels. Now that your Elixir VMs are clustered, you can reconfigure your Endpoint to use Distributed Elixir (the PG2 back-end) as described here.