From zero to Production Elixir in 1 month (2/2)

In our previous post, we introduced Imago. Its aspirations were to provide an easier and faster way to upload and retrieve uploaded documents, while still delivering the same Paperclip features we were already using.

Currently with Paperclip, even if we need the document right after it has been uploaded, we have to retrieve it again from Amazon S3. That results in a round trip to S3 and the consequent document decryption and decoding. To overcome this limitation, Imago implements a caching mechanism, leveraging the power of Elixir processes. Each Imago operation on documents (ie., uploads or downloads) spawns a DocumentWorker Elixir process identified by the document UUID. Each spawned process lives for a while in the Erlang VM after performing its operation, and during its lifetime holds the unencrypted document contents on its state. As such, a download operation for a document that has just been uploaded will still be able to find the process responsible for the previous upload (identified by the document UUID), and therefore will retrieve the unencrypted document from that process state instead of having to retrieve it from S3 (and incurring in the expensive overhead it entails).

Interaction between umbrella apps

Document upload

In the sequence diagram above we can see how a document upload is processed:

  1. The API (a Phoenix app) receives a HTTP POST with the document metadata and its content and spawns a DocumentWorker process to handle the upload operation;
  2. The document metadata is stored on the database, the document is encrypted and we upload it to S3;
  3. The DocumentWorker process returns an :ok tuple to the API and the API returns a HTTP 200 OK to the caller;
  4. As we’ll later see, while the process lives on the Erlang VM, the document is “cached” on the process internal state.
Document download (cache miss)

This sequence diagram is similar to what we previously saw:

  1. A fresh download has to retrieve the metadata from the database (including the encryption keys and the document S3 path), get the encrypted content from S3, decrypt it and then finally return it to the client;
  2. Once again, during the lifespan of the DocumentWorker process, the document is cached on Imago.
Document download (cache hit)

This final sequence diagram illustrates how simple a download operation is when a DocumentWorker already exists for a given document UUID:

  1. A new DocumentWorker isn’t spawned since a process already exists for that document;
  2. Therefore, a download operation is short-circuited, ending with the process immediately returning the document contents, without hitting the database or Amazon S3.

Welcome to the API

We will now dive into the code, starting with the API controller actions which allow a client to obtain documents metadata or to download the document itself:

Both actions try to fetch the document metadata by ID from the Backend:

  1. The metadata action ends by rendering the document metadata on the corresponding view;
  2. The download action, on the other hand, calls the backend afterwards to handle the download operation.

Here we can see how the upload action is defined on the API:

This action is more complex because it has to handle the form submission for the metadata along with the document to upload:

  1. First, we fetch the content from disk (since Plug.Upload uses temporary files to store uploads);
  2. Then we just call save_and_upload_document, that stores the document metadata and uploads its contents to S3 through the backend (as a result, a DocumentWorker process will be spawned by the backend), as one can see on the following code snippet:

Now, remember that the download API action on the controller called the download function on the backend:

  1. This download function on the backend tries to find a DocumentWorker for the document UUID it received; if a process with this UUID doesn’t exist yet, it spawns a new DocumentWorker to handle the download operation;
  2. Notice how we’re sending a :download_document message to the worker (this is a synchronous call, so the calling process will block until it receives the reply from the worker).

For the upload, we have a simpler behaviour, because we always spawn a new DocumentWorker process (it is an upload, so the document and its respective process don’t exist yet). As such, we just spawn the new process and ask it to handle the save_and_upload_document for us (again, sending it a synchronous message):

To find or to spawn a process, that is the question

Breaking from the code for a moment, let’s look at how we find an existing DocumentWorker process, or spawn a new one if none exists. Whereas for a download operation we try to find a process with that UUID, for an upload we simply spawn a new one.

This is how we find or spawn a DocumentWorker process to handle the request, be it a download or an upload.

Here we can see how we are getting the document worker on the previously analysed download function:

  1. We start by asking (using the GenServer whereis function) if there is a worker with the document UUID as its name, and pass along: 
    a) An :ok tuple with the worker if the process exists;
    b) Or an :error tuple if a process with that UUID doesn’t exist.
  2. Then, the find_or_spawn function pattern-matches the :ok or :error tuples to, respectively, return the existing worker or spawn a new worker.

DocumentWorker handling downloads

Let’s now analyse how the DocumentWorker actually handles a :download_document call:

  1. The process first sends itself a delayed :timeout message (so that the process “ends” after the specified timeout) and then passes its internal state to a reply_with_content function;
  2. This function checks if the state it receives already has the document content. If so, it simply replies with its current state (in this case, the state map has a content entry). If not, it gets the document content using the private get_document_content function and saves it on the process internal state. In the end, it replies with a state filled with content;
  3. The get_document_content function will download the encrypted document from S3, decrypt and decode it, and finally return its contents to the caller.

Here, be uploads

Now, let’s see how the DocumentWorker handles a :save_and_upload_document call:

  1. We first try to store the document metadata in the database and upload its content to S3 (with the create_new_document_and_upload_to_s3 function call):
    a) If successful (that is, we got an :ok tuple with the document and the S3 response), the new_state will be a new map with content and the result we just got;
    b) If it wasn’t successful (that is, we got an :error tuple), the new_state will just be a map with a result entry.
  2. Then, the process sends a :timeout message to itself, similar to what we’ve already seen when a process handles a :download call;
  3. In the end, we merge this new_state with the current process state and return the result to the caller.

Now let’s take the deeper trail here and look at what happens when the DocumentWorker calls the create_document_and_upload_to_s3 function:

This function receives the content and the document metadata (represented by the NewDocument structure). We then:

  1. Generate the encryption key, merge it into the document metadata and then we try to save it in the database;
  2. If we were able to correctly store the metadata, we retrieve it, and finally call the encrypt_and_store_s3 function;
  3. If all steps went well, we return an :ok tuple with the inserted document (that is, the document metadata) and the S3 response;
  4. If any step of this “pipeline” goes south (ie., we can’t match the right-hand side with the left-hand side), the with construction provided by Elixir jumps to the else block, and we just call our error handler.

Final thoughts

The previous “walk” through Imago codebase shows many of Elixir features, like the usage of pattern matching, guard clauses, leveraging Erlang’s OTP to better use Elixir processes, but also highlights how simple the code can be, when we really embrace the Elixir way of doing things.

My first experience of developing an application in Elixir showed me firsthand how Elixir creators had designed the language and its ecosystem to really welcome newcomers, with an incredible and easy-to-access documentation, “standard library” functions with familiar names, a well thought-out REPL console and even keeping the “pry” debugging for all of those coming from the Ruby world.

While Elixir doesn’t solve global warming, it’s definitely liberating to code only thinking about the happy path. If it fails, just “let it crash”. Despite that, and especially in the beginning, I’ve caught myself more than once trying to code too defensively.

Fully using pattern matching and guard clauses is difficult at first, but their power becomes evident once we start to use more complex data structures and logic. One even starts to see the beauty of many one-line function definitions 😊.

The ecosystem and the tooling that comes along, namely Hex and Mix, work really well and are stable from the get-go, enabling a smooth development experience.

I’ve also felt that creating specs (unit testing) is a more time-consuming activity. Since mocks are discouraged, one should instead create test modules that abide by the same contract and devise a way of injecting them in compile time, depending on the environment.

Coming from the Ruby world, debugging also feels harder as well: due to the purity of Elixir modules, one simply cannot access module variables nor invoke private methods inside a breakpoint. Stepping through the code is also impossible; you can only continue the execution afterwards.

n.b.: Elixir 1.5 introduces additional debugging features , like setting breakpoints from inside an IEx session, that definitely smooths the development workflow

A final word goes to the incredible response times that Phoenix delivers. It is really refreshing to see response times from a Web application measured in microseconds 🚀. During May, our Kubernetes-deployed Imago app received more than 3 million requests on its health endpoint and replied to all in less than 1 millisecond, with an average response time of half a millisecond:

If all this piques your curiosity about Elixir, please give it a go, create a personal project with it or even pick a smaller project at your workplace, and maybe you’ll find the perfect problem waiting to be solved with this really cool language and paradigm. As the graph below shows, the interest about Elixir is growing and growing!

You can find the actual query here