From zero to Production Elixir in 1 month (1/2)

One of the reasons I joined Onfido was to be able to tinker with Elixir. Back in June (and still in my previous job), I heard Paulo’s talk at the Landing.jobs festival about Onfido and some of their technology stack. I got hooked, and soon enough I was reading more and more about Elixir, until I decided to grab Dave Thomas’ thorough Programming Elixir.

Back in college I’d had a good experience with Scheme, Lisp and really enjoyed using the more functional-ish parts of .NET (LINQ, closures, lambdas, etc.), so I was really looking forward to starting using Elixir. It promised to solve all the scalability and concurrency problems with a familiar ruby-esque syntax, had an elegant concurrency model and required only a couple of servers.

When I became an Onfidoer at the end of the last year, I continued learning about Elixir’s pattern matching, its modules system, and how Elixir’s processes differ from the traditional OS process (hint: a lot!). In the meantime, I got involved in an initiative to analyse how we could extract the document upload/download functionality from our current system to its own microservice.

Problems with Paperclip

We were using the Paperclip ruby gem to help with encryption, upload and download tasks. This gem provided a convenient `attachment` object that abstracted all the work needed behind the scenes. However, the out-of-the-box upload provided by Paperclip was synchronous, it didn’t have a caching mechanism, and didn’t offer a direct way of getting the document from S3 without interacting with the Rails application through which the document had been uploaded previously.

The following code snippets show how to configure Paperclip to use S3 and incorporate it in a model through Rails concerns:

Since we are constantly handling a significant number of documents of a considerable size (2–3 MB), the new solution should allow uploads to be made asynchronously to avoid blocking our workers. The envisioned application should also provide a simpler way to access the documents stored in S3.

Imago, iterations and improvements

The status quo looked like this:

This new application was named Imago (“image” in Latin, or the last stage an insect attains during its metamorphosis, according to Wikipedia 🤔🐛) and the first iteration of our plan was to create two HTTP endpoints, an upload and a download one, allowing two possibilities:

1. Having an alternative way of uploading documents would allow us to transparently replace the usage of Paperclip with a call to our Imago’s upload endpoint (Imago persists exactly the same information as if the document was uploaded by Paperclip). Since Imago mimics the way Paperclip stores all the document information, it’s completely interchangeable with Paperclip, and documents uploaded by Imago may be retrieved by Paperclip or vice-versa.

2. The HTTP endpoint to download images by document UUID would not only enable a more direct access to our stored documents, but would also allow the implementation of a caching mechanism, something that Paperclip doesn’t support directly (each access to a Paperclip S3 attachment always implies a round-trip to S3 and the corresponding file decryption).

Here’s Imago already settled on its place:

Since one of Imago’s objectives was to provide easier access to documents, and given we’re paving our way to a micro-service based architecture, the most natural choice was to create Imago as an independent HTTP application. This requirement led us to one of Elixir’s most popular web frameworks: Phoenix.

Imago would receive a request with a document UUID, fetch its metadata from the legacy database (namely, its S3 path and encryption keys), retrieve the encrypted document from S3, decrypt it locally and then return it to the client.

Using umbrella apps

Imago’s first spike was based on a single Phoenix application, implementing all of these responsibilities. However, these well-defined steps meant well-defined responsibilities. After some iterations, we decided that these disjointed features would fit really well into the Elixir way of organizing different applications under a so-called umbrella app.

As such, the following umbrella apps were created under the main Imago application:

- db application; responsible for accessing our legacy database, persisting the document metadata when the document was uploaded and obtaining it when the document was requested for download;

- s3 application; responsible for the interaction with Amazon S3, be it uploads or downloads, and also the encryption tasks;

- backend app; that relied on both db and s3 apps and provided the creation of a new document, the retrieval of a document’s metadata and the document’s content itself. It’s here that also lives our DocumentWorker, an Elixir process spawned for every download/upload operation (its role will be detailed in the next blog post);

- api app; consists on a Phoenix application that uses the backend app and exposes three endpoints, to allow the upload of a document and the retrieval of document’s metadata and content;

- GraphQL and Dynamo endpoints were also planned, and so we also created a graphql and dynamo umbrella apps.

You can see here how the umbrella apps interacted and how the project was structured:

In the next blog post, we’ll analyse how Imago accomplishes its objectives, by stepping through each of its umbrella apps and seeing how apparently simple code yields some really interesting results.

P.S.: You can find the second part of this story here

Keyboards+vim+elixir ❤.