Elixir Open Source @ Frame.io

Michael Guarino
Frame.io Engineering
5 min readNov 5, 2018

Frame.io recently made a major move to Elixir, implementing nearly all of our API on a pure Elixir stack. This has been an amazing success for us so far, allowing us to leverage a very powerful stack using Ecto, Pheonix, Ueberauth and more to make a clean, efficient api codebase for our entire platform. But even with a very rich array of open source solutions, there were some use-cases we had that were not properly solved. So we felt now that the dust has settled post migration, it’s time for us to give back to the Elixir community that has served us so well.

ExlasticSearch

Elixir already has some pretty good Elasticsearch libraries (we liked elastix for our use case). But what we really wanted was a good way to map our ecto models to ES indices. And if we couldn’t get that, a nice DSL for writing es queries, which get very ugly quickly when you’re just slapping together a large complicated bool query, would have been a close second. We felt bold, so we implemented both. Our solution was to imitate ecto as much as seemed appropriate, recognizing the differences between ES and a standard relational datastore. The final product we eventually named exlasticsearch. It might be easiest to go to some code to explain.

An ExlasticSearch.Model is defined alongside an ecto model

The mapping macro uses the extant ecto type information to infer the appropriate es type for each mapping, while it’s still possible to override that behavior where needed (for instance for strings that should be interpreted as keywords, adding analyzers or handling custom ecto types)

To actually index the object, you need to implement ExlasticSearch.Indexable for it, like:

Since our model is relatively trivial, this amounts to just selecting the columns we want to keep, but you can also define preloads (in case you want to denormalize prior to indexing), and of course rewrite or infer some mappings in the document.

Once that boilerplate is done, we can start indexing and searching:

(Ok that query might just be for illustration purposes)

ExlasticSearch follows ecto’s lead in utilizing the data mapper pattern, as seen above. The ExlasticSearch.Repo module is the executor of any query or index action, an ExlasticSearch.Query can be built with the search_query/0 function of your model and the helper functions in ExlasticSearch.Query (which were designed to be as pipe-friendly as possible). And of course the actual in-memory representations of your data remain your ecto schemas. Bulk actions are also supported naturally using {:index, object} | {:delete, object} tuples.

I’ve personally loved this library, it has reduced a ton of boilerplate for managing es updates, and keeps our code clean and self-documenting.

There are more goodies than seen above. For instance, you can implement your own type inference using the ExlasticSearch.TypeInference behaviour, and there’s a Flow based streaming indexer defined in ExlasticSearch.Repo that’s also quite useful — it’s what we use to (re)index entire tables. Nevertheless, there’s still some more work to come, notably a better story around no-downtime mapping updates.

Herd

Frame.io uses memcache and redis heavily, hosted on elasticache, both as a performance optimization and as a way to isolate unnecessary load from our dbs. There are already decent connection libraries, but there aren’t good ways to manage connections to an entire memcache or redis cluster (without using something like twemproxy, which is good software in itself). We had been using a library called memcachir, which is nice, but decided to enhance and generalize it after thinking about its architecture. The result was herd. Probably the best way to illustrate it is by showing it in action with managing memcache:

A herd has two basic components, a cluster and a pool. The cluster polls a service discovery implementation (for memcachir, that’s the elasticache discovery endpoint), inserts nodes into the router (a hash ring by default), and notifies the pool of ups/downs. The pool is a dynamic supervisor that starts/stops connection pools to each node in response to a cluster and adds them to a registry for name resolution. Finally, we can generate a supervisor for you to make sure everything fault tolerant. Each component is pluggable, you can define your own router, your own service discovery, most functional methods are overridable (as seen with worker_config and pool_config). Dynamic supervision is used to make the pools as reliable as possible (we don’t want to destroy connections if only a portion of the cluster is replaced).

It has already come in handy with one incident at Frame.io where we needed to scale our memcache cluster out in response to load, then scale it back in, with no need for a configuration deployment on our api. It also has benefits to a memcache deployment that utilizes autoscaling. In that vein, Kube and dns based service discovery are a clear TODO so docker based deployments can be supported, although we haven’t had a need to implement it as of yet.

ExLimiter

Elixir already has a number of good rate limiting libraries, but few of them implemented the leaky bucket algorithm that tends to perform best in response to traffic spikes (which we actually see a lot) or those that do were not distributed (which is a problem since our architecture uses autoscaling heavily, making it difficult to pin down the exact limit to use). So we made one ourselves. I’ll let the code do the talking again:

As you can see, it is designed to support arbitrary storage solutions, with a pre-built memcache implementation (I’m thinking about adding a redis implementation too, but I’ll need to figure out the lua scripting for it). You can use the default limiter just by communicating with the ExLimiter module itself, or you can define your own implementation using a different storage if needed. Lastly it implements a nice plug for applying rate limits to an action (based on an ip, phoenix controller, action triple by default, but bucket inference is also configurable). The plug will call your own configured error handler, and will also pass along standard rate limiting headers to guide well-behaved clients.

Wrapping Up

This post highlights only what I consider our most useful libraries; we’ve implemented a few more (cloudfront_signed_ex for signing cloudfront urls, ex_geo for ip geolocation with live update support, and cereal for json api response typing). I’ll spare everyone the details of each, but needless to say, we’re committed to doing our part in making the Elixir ecosystem as dev-friendly as possible. And of course contributions are welcome if anyone finds the code useful.

Like what you’ve read? We’re hiring!

At Frame.io, we’re powering the future of creative collaboration. Over 500,000 video professionals use Frame.io to seamlessly share media and gather timestamped feedback from team members and clients. Simply put, we help companies create better video, together.

Across the stack we’re big users of AWS Lambda, Elixir, Swift, Go, and React. We’re a small, polyglot team that thinks big and works collaboratively to solve the biggest challenges for our customers that include Vice, BuzzFeed, Turner and NASA.

--

--