Why Brex Chose Elixir
Almost every engineer that starts at Brex asks, “Why are we using Elixir?” It’s a question with an interesting answer, and one that we thought a lot about before we even began building Brex.
A brief history before Brex
The decision to use Elixir traces back to way before we started Brex. In 2013, my co-founder Henrique and I built a payments company in Brazil called Pagar.me. After three years, our business became the third largest payment processor in the country.
From a technical standpoint, scaling Pagar.me was challenging. We decided to use Node.js since its infancy (version 0.10) and way before it took over the world — which means that tooling, libraries and best practices were still premature. Not only did we use a nascent platform, we also didn’t understand the payments space well enough to start with a definitive architecture from day one.
The circumstances allowed us no other choice but to start with a monolithic system. Even though it enabled us to move very fast in the beginning, after a year in we realized the clear need to refactor our system into microservices. However, because of the still nascent ecosystem around Node.js, that was easier said than done. We had to build a lot of the basic distributed systems tooling from scratch: RPC layers, eventual consistency infrastructure, service discovery, deployment tools, etc. There was a point where we were spending 30% of our engineering bandwidth building distributed systems infrastructure rather than features.
A few years later when we started Brex, we wanted to use a platform that would provide most of these aforementioned tools out-of-the-box. That said, at the time I decided on this approach, I could not have imagined that the solution would trace back to a platform I was acquainted with way before starting Pagar.me: Erlang.
Elixir / Erlang ecosystem
In 2011, I was working as a software engineer at a telecom company that got acquired by the biggest payments company in Brazil. During my tenure, I had my first contact with payment systems but also with the telephony world, where Erlang is ubiquitous.
Even though I didn’t extensively use Erlang back then, I was very impressed with its feature set and ecosystem. Erlang was built in the ’80s by Ericsson and was designed with reliability in mind — after all, telephony systems cannot go down. Ericsson was able to develop a solid VM / standard library (OTP) that provided node discovery / transparency, RPC, hot code reloading (i.e. zero downtime deployments) and concurrency primitives for free, but 30 years ago! However, due to its weird syntax and steep learning curve, at the time, I didn’t spend much time considering Erlang as a suitable option for modern development.
Around the same time, I was working with Ruby/Rails and was invited to speak at RubyConf Brazil. There, I met José Valim, a Brazilian developer that was a Rails core committer back then. I started following José’s work when he left the Rails core team in 2014 and got closer to the Erlang ecosystem. A year later, his move resulted in the inception of Elixir — a new functional language that looks like Ruby (and was significantly friendlier than Erlang), but running on top of the Erlang VM. A new, functional language that resembled Ruby, aligned with a solid ecosystem optimized for distributed systems sounded like a good option for the problem I was trying to solve. I started using Elixir in 2016 and quickly decided to give it a try when building Brex.
At first glance, Elixir closely resembles Ruby: the language looks familiar, and libraries like Ecto (ActiveRecords for Elixir) and Phoenix (Rails for Elixir) make Ruby developers feel at home. The Elixir core team also did a good job providing abstractions that simplify the intricacies of Erlang and greatly reduce the learning curve of primitives like GenServers (used to keep state), Processes (concurrency) and Supervisors (responsible for maintaining child Processes running), which makes Elixir an interesting choice for building microservices.
However, some of the nicest characteristics of Elixir are its functional nature and the macro system. By forcing immutability, complex/concurrent code becomes much easier to reason about, and the macro system makes it easier to enhance the language and build DSLs, which comes in handy if used carefully. Also, those features also have an interesting effect from a team building perspective: there’s a strong selection bias of candidates that are attracted to the language and those who are a fit with the rest of our team (after all, we’re engineers interested in functional languages and distributed systems).
Our assessment of Elixir, one year later
After using Elixir for 18 months, we have a strong grasp of how the platform behaves in a real-world setting. One observation is that despite the fact that Elixir is a relatively niche language, new hires that never had contact with Elixir before are productive within three weeks. There are a decent amount of books / documentation available on the language that accelerate the ramp-up process.
The second main observation is that the “Erlang way of doing things” doesn’t play well with modern deployment technologies like Docker and Kubernetes. Features like hot code reloading require mutating the state of containers deployed, which goes against concepts like immutable infrastructure that are enforced by Kubernetes. Also, the clustering / RPC infrastructure from Erlang assumes a full mesh cluster topology, which is not ideal for isolated microservices. Therefore, we had to build a fair amount of additional tooling compared to our initial expectation.
The Elixir language itself is friendly, but still in its early days. As our codebase rapidly grew, we learned that we needed macros to reduce boilerplate and make it easier for new developers to use the correct language patterns, especially as they’re not obvious given the nascent language. Also, the lack of a type system makes large-scale refactoring harder, and therefore would be a great addition to the Elixir ecosystem.
As with any language or framework, Elixir has its pros and cons. For us, it was important to understand the trade-offs of the platform and how that overlaps with our business needs. It’s still early to know if Elixir will pay the same magnitude of dividends to Brex as did Node.js for Pagar.me, but we believe that the ecosystem is very promising and there is good reason to believe it will become a major piece of technology.