Scalable and serverless media processing using BuckleScript/OCaml and AWS Lambda/API Gateway

Being a Music Licensing company, we at Audiosocket have to process a lot of media files. From a tech perspective, the challenges to deploy, scale and maintain such applications have interesting issues that are much different from Web technologies. Today, I would like to talk about a stack that we’ve recently been pushing in production and which involves a couple of technologies that, combined together, are a pretty neat solution to a lot of our problems.

The Plot

Contrary to web requests, media processing does not require fast responses. Requests are processed asynchronously and the caller is notified whenever processing has finished. Such requests require much more resources in CPU and memory than a typical web response and those resources can vary greatly, depending on the processing pipeline, for instance applying audio compression, signal processing filters, etc.. Finally, these type of computation are blocking but, unlike with evented framework such as Node.js, blocking is CPU-bound, a consequence of the heavy computations required to process media files. Therefore, an event loop, which works great when blocking is caused by networking delays, will not be of much help in our case.

Because of this, scaling a media processing pipeline requires understanding and anticipating how a bunch of resources are consumed such as CPU power, memory footprint and disk space. Previously, we used worker frameworks like sidekiq and they worked great. But, this requires maintaining server instances with their usual burden of potential software crash, full hard drive, etc. Furthermore, deciding when and how many new servers to spin-up when requests load increase can be tricky. On top of that, the granularity of allocated resources is not great and there are delays waiting for new servers to be ready.

For these reasons, we decided to move to the AWS Lambda service, which brings the granularity down to the function level and delegates scaling issues entirely to the service operator. In front of those Lambda functions, the AWS API Gateway receives API requests and interfaces with the Lambda processes. Finally, in order to make that transition sustainable in the long-term, we have been using BuckleScript, an OCaml interface to Javascript. Combined together, these three technologies result in a media processing architecture that is very robust, totally serverless and immediately scalable at the request level.

The chart below illustrates how the stack operates. If you are curious, there is a fully functional demonstration code available here. It is packed with all the tricks explained below plus some extra ones. For sake of simplicity, though, the DynamoDB part is not implemented.

The Media Processing Pipeline

Buckling Down

When working in a startup environment, time to production is quite crucial. Putting together some code to prove an idea is feasible, releasing a Minimal Viable Product for demonstration purposes, oftentimes developers have to choose the quickest path to write down their code. Node.js is a great tool for such fast-paced work and we’ve used it a lot. Coincidentally, it is also one of the languages available as runtime for the Lambda platform.

However, as flexible as Javascript is, it is also quite difficult to maintain in the long run. There is an inherent lack of structure in it which makes it very easy to quickly patch together a small demonstration project but, later on, creates potential hassle when updating or refactoring parts of the code, or when asking new developers to work on code they’ve never seen before.

In order to make Javascript more suitable for large-scale industrial applications, the BuckleScript framework has recently emerged, originally supported by Bloomberg. BuckleScript interfaces Javascript native code with OCaml. You start by defining bindings from Javascript to OCaml, then write OCaml code using those bindings. Finally, the BuckleScript compiler creates Javascript code from your OCaml code that you can deploy as native Javascript. The benefit of this loop, or transpiling, is that you then get to use all the fabulous tools that the OCaml language provides in your Javascript application.

OCaml’s Toolbox

Developed initially at France’s INRIA research center, OCaml is a functional language, with a statically inferred type system, an efficient Garbage Collector and a pragmatic approach to the ML language paradigm. Similar to Haskell, it provides the usual tools of functional languages but is also amenable to some imperative primitives whenever appropriate, for instance for I/O programming. Recently, it has enjoyed an interesting surge in various industrial applications, from Financial Trading to Facebook and recently Uber. Its ecosystem of development tools has greatly improved in the past years, in particular with the addition of a solid package manager, OPAM. All-in-all, even though still a niche language, this constitutes a pretty successful transfer from academics to software industry.

One of the very important feature of OCaml is its static type system. In OCaml, every variable has a type that is determined at compile-time and, if your code isn’t too tricky, most of the time, you do not have to explicitly write which type those variables should have. This means that, if you code compiles, then you are guaranteed that your functions will always be called with arguments of a specific, fixed type.

In Javascript world, this means that you no longer have to deal with variables being null or receiving a string when you expected an int. Furthermore, when refactoring your code, any change in a function’s expected type trickles down via the compiler into each and every call to that function. The code will not compile until you have fixed all the calls to that function, making refactoring a safe and truly enjoyable experience — something that is very important when dealing with large scale applications and developer teams.

In the following, I’ll be showing some OCaml features that can be very useful when writing code for BuckleScript/Javascript. If you are not familiar with the language, you might want to start with taking a peek at Real World OCaml, the reference book for OCaml programming.

Lambda handler’s type

In Lambda, a function is essentially a callback handler. It receives three arguments, a JSON object describing the function’s parameters, a context object, and a callback to be executed when the function has finished its execution. In OCaml/BuckleScript, this can be described with the following type:

type error = exn Js.Nullable.t
type ‘a callback = error -> ‘a -> unit
type context
type (‘a, ‘b) lambda_function = ‘a -> context -> ‘b callback -> unit

A callback is a function that takes two arguments, an error, which can either be null or an exception, a result of universal type ‘a and returns nothing. This is the usual callback paradigm in asynchronous Javascript code where, if the error argument is null, the function is assumed to have executed successfully and possibly returned a value as its second argument.

Finally, the type for the Lambda function takes an input parameter of universal type ‘a, a context variable and a callback that returns a value of universal type ‘b. This function returns nothing or, if you prefer, its returned value is ignored.

Now, anticipating on what we’ll see later, it turns out that, when interfaced with AWS’s API Gateway through serverless, the returned value of a API handler should be a Javascript object of this form:

{statusCode: <int>, body: <string>}

In this case, we can amend our type declaration above and add:

type api_response = <
statusCode: int;
body: string
> Js.t
type 'a api_handler = (‘a, api_response) lambda_function

Now, each time that we declare a Lambda function to be of the type api_handler we will be guaranteed that this function returns the appropriate type of Javascript object.

The Asynchronous Monad

Another typical hassle in Javascript/Node.js world is the fact that one has to constantly deal with callback-based computations. Usually, this looks like this:

function asynchronousProcessing(callback) {
doSomeAsynchronousWork(function (err, result) {
if (err) { return callback(err); }
    doMoreAsynchronousWork(result, function (err, newResult) {
if (err) { return callback(err); }
      <etc etc..>

Having to repeat that pattern constantly is prone to making mistake. In order to make this less painful, APIs such as Promise have been introduced. However, none of those really solve the problem of having to repeat the same pattern over and over again. Here comes the notion of Monad!

Monads is quite a hip term with deep connections to Category Theory but, really, it is just a fancy way to describe a computing abstraction. Essentially, a monad describes a certain type of computation and how to combine them together. Let’s look at an example applied to our case:

(* A asynchronous computation takes a callback and executes it when it has finished, either with an error or with a result of type 'a. *)
type error = exn Js.Nullable.t
type 'a callback = error -> 'a -> unit
type 'a t = 'a callback -> unit
(* A computation that returns a result of type 'a. *)
val return : 'a -> 'a t
(* A computation that returns an error. *)
val fail : exn -> 'a t
(* Combine two computations. *)
val (>>) : 'a t -> ('a -> 'b t) -> 'b t

Now, let’s look at the implementation of this monad:

let return result = fun callback ->
callback Js.Nullable.null result
let fail error = fun callback ->
callback (Js.Nullable.return error) (Obj.magic Js.Nullable.null)
let (>>) current next = fun callback ->
current (fun err result ->
match Js.toOption err with
| Some exn ->
fail exn callback
| None ->
next ret callback)

The return function takes a callback and executes it with null for its error parameter and a given result. The fail function takes a callback and executes it with the given error and null result. We use a little trickery here with Obj.magic here to pass the result.

Finally, (>>) implements the usual pattern to combine asynchronous computations: given a callback, a current computation and the next one, it executes the current computation with a new callback which, if passed a non-null error, returns immediately by executing the original callback with that error or, else, passes result to the next computation, along with the original callback.

Now let’s look how the code above rewrites using this asynchronous monad:

(* Define the asynchronous processing pipeline *)
asynchronous_computation =
do_some_asynchronous_work >> fun result ->
do_more_asynchronous_work result >> fun (..) ->
(etc etc..)
(* Execute it! *)
asynchronous_computation callback

No more repeated patterns, no more potential errors!

Polymorphic variants and phantom types

Finally, let’s look at a more tricky and fancier application of the OCaml type system using phantom types and polymorphic variants.

Phantom types are parametric types that are used to constrain sub-classes of an generic type for various API restriction purposes. This a very powerful tool. For instance, it makes it possible to force a API user to follow a pre-defined flow in its calls such as calling init followed by config and finally run. Here, we are going to use them to annotate the different events that the user can listen to on Node.js’s readable and writable streams.

In order to do so, we use polymorphic variants, another power feature of the language which can be quite handy but sometimes requires a bit of work to understand and use properly. Polymorphic variants are collections of different labels that can be sub- or sur-classed. Let’s look at our code to fix the ideas. You might want to also check out the official Node.js Stream API here.

type 'a t
type write_events = [
| `Close of unit -> unit
| `Drain of unit -> unit
| `Finish of unit -> unit
| `Error of exn -> unit
| `Pipe of readable -> unit
| `Unpipe of readable -> unit
] and read_events = [
| `Close of unit -> unit
| `End of unit -> unit
| `Readable of unit -> unit
| `Data of string -> unit
| `Error of exn -> unit
] and writable = write_events t
and readable = read_events t
type events = [write_events | read_events]
val on : ([< events] as 'a) t -> 'a -> unit

In this API, we define the list of events that readable and writable streams can receive. Each of these events is associated with a handler to process the type of data that will be received when the event occurs.

Then we sub-class the generic stream type ‘a t by annotating it with the events associated with each type of stream, readable and writable. Finally, the on function is typed so it will only accept events that are annotated in the stream type, making sure, for instance, that only a readable stream can be used to listen to the `Data event. Otherwise, the compiler will complain with a message such as:

Stream.on writable (`Data (fun s -> ...))
> Error: This expression has type [> `Data of string -> unit ]
> but an expression was expected of type Stream.write_events
> The second variant type does not allow tag(s) `Data

Now, let’s peek under the hood to see how this is implemented:

external on : 'a t -> string -> ('b -> unit) -> unit = "" [@@bs.send]
let on stream = function
| `Close fn -> on stream "close" fn
| `Drain fn -> on stream "drain" fn
| `Finish fn -> on stream "finish" fn
| `Error fn -> on stream "error" fn
| `Pipe fn -> on stream "pipe" fn
| `Unpipe fn -> on stream "unpipe" fn
| `End fn -> on stream "end" fn
| `Readable fn -> on stream "readable" fn
| `Data fn -> on stream "data" fn

Pretty simple after all! First, we declare an external on method on stream objects. It can receive any type of stream, a string-typed event name and a callback receiving a value of a yet unknown type. This is the low-level binding to the Node.js’s EventEmitter API.

Then, we decorate this function by re-defining it as a function that receives a polymorphic event type and handler and explicitly unwraps it down to the expected parameters on the Node.js side. Voila!¹

Let’s just finish here by noticing that the explicit type restriction on the on function is defined in the type interface. Without it, there is nothing that tells the compiler to only accept event handlers that are part of the stream’s type annotation.


The last piece of our stack that puts everything together is serverless. Their motto, “Focus on your application, not your infrastructure” is actually quite a true statement.

The toolkit allows you to configure the whole combination of AWS services using a simple YAML file. Here’s the gist of it from our demo code:

service: media-pipeline-demo

name: aws
runtime: nodejs6.10
region: us-east-1
stage: production
- Effect: Allow
- s3:GetObject
- s3:PutObject
Resource: "*"
- Effect: Allow
- lambda:InvokeFunction
Resource: "arn:aws:lambda:us-east-1:478746233480:function:media-pipeline-encode-file"

name: media-pipeline-encode-file
handler: lib/js/src/index.encode_file
memorySize: 1536
timeout: 300

name: media-pipeline-queue-encoding
handler: lib/js/src/index.queue_encoding
- http:
path: encodings
method: post
cors: true
integration: lambda-proxy

This file contains all the configuration you should need, including AWS access policies, environment variables and endpoint definitions.

The toolkit also has an offline plugin which allows you to test your endpoint locally:

% serverless offline
Serverless: Starting Offline: production/us-east-1.
Serverless: Routes for encode-file:
Serverless: (none)
Serverless: Routes for queue-encoding:
Serverless: POST /encodings
Serverless: Offline listening on http://localhost:3000

This makes it very convenient to test locally before deploying your code. However, calls to execute AWS lambda functions, here encode-file will still go to AWS.

Finally, when you are satisfied with your code, you can deploy:

% serverless deploy
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service .zip file to S3 (23.95 MB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
Serverless: Stack update finished...
Service Information
service: media-pipeline-demo
stage: production
region: us-east-1
stack: media-pipeline-demo-production
api keys:
encode-file: media-pipeline-demo-production-encode-file
queue-encoding: media-pipeline-demo-production-queue-encoding
Serverless: Removing old service versions..

Voila! You’re ready to go..


[1] BuckleScript has a specific annotation to achieve the same purpose but it is the author’s opinion that it is not really usable. In particular, it cannot be used with a proper .ml /.mli interface declaration, which should always be considered good practice when writing OCaml code