CloudFile

This article discusses a library we built at Urbint to simplify our file storage needs.

Problem

There are dozens of available solutions for remote file storage. Each of these choices has its own API. We want to be able to easily switch between local storage and remote storage. Also, if we decide later on to switch our cloud storage service, we want the refactor to be simple.

Solution

Pluggable storage drivers.

In this article we’re going to create a basic client for working with various storage backends in a familiar way. Before doing that, it’s important to define what is meant by a “unified API”.

There are three components to functions that need to align in order for it to comply with a unified API. Functions across different modules should have matching:

  1. function names
  2. parameters, including the order in which they are applied
  3. return values

With all three criteria satisfied, you have the opportunity to take advantage of a powerful design pattern: adapters. We’ll illustrate this by looking at a file system client that we use internally at Urbint.

Let’s say we wanted to create a client for working with an FTP server. The Erlang library contains two modules for such work: one for FTP and another for SFTP. Both of these libraries have fairly large APIs, but let’s look at three common operations that we can abstract:

  • read: loads a file from the server and returns its contents as binary
  • write: saves binary to a file on the server
  • remove: deleted a file from the server

1. Function Names

The FTP library refers to the “read” operation as recv_bin while the SFTP library refers to it as read_file.

When it comes to writing, the FTP library uses the function send_bin while the SFTP library uses write_file.

Lastly, for removing files from a server, the FTP library uses delete. Conveniently the SFTP library also uses delete. These two modules happened to share a language for this action — excellent.

2. Function Parameters

Each of the three operations share parameters and the order in which they are applied. Therefore we can forward the arguments downstream unchanged.

To manage the divergent function names, we’ll create a shim around both modules to align their function names. We will model our “unified API” off of Elixir’s existing File module.

+---------+----------+------------+
| NEW API | FTP | SFTP |
+---------+----------+------------+
| read | recv_bin | read_file |
| write | send_bin | write_file |
| rm | delete | delete |
+---------+----------+------------+

We can abstract both FTP and SFTP into adapters and introduce the concept of a pluggable FTP driver resulting in code that looks like this:

ftp_driver.read(pid, path)
ftp_driver.write(pid, content, path)
ftp_driver.rm(pid, path)

In this example, ftp_driver, can be configured to refer to an FTP adapter or an SFTP adapter.

CloudFile

As aforementioned, we use a library internally that allows us to specify different storage drivers at configuration time while keeping function calls, expectations, and assertions the same to reduce the number of breaking changes needing to be made when the storage layer is changed.

CloudFile.write("ftp://path/to/file.txt", "testing")
:ok
CloudFile.read!("ftp://path/to/file.txt")
"testing"
CloudFile.rm("ftp://path/to/file.txt")
:ok
CloudFile.read("ftp://path/to/file.txt")
{:error, :enoent}

The return value from the last call to CloudFile.read is interesting because it illustrates another complication in creating unified APIs.

3. Function Return Values

From the Elixir File module,

  • :enoent the file does not exist
  • :eacces missing permission for reading the file, or for searching one of the parent directories
  • :eisdir the named file is a directory
  • :enotdir a component of the file name is not a directory
  • :enomem there is not enough memory for the contents of the file

Obviously there are technical shortcomings that may be insurmountable or unrealistic to replicate, but in those scenarios it’s important to not force a square peg into a round hole.

The goal is to attempt to map the error codes from each library into POSIX-compliant error codes. For those codes that cannot be mapped without information loss, we can simply forward these along to the consumer un-altered.

State of Affairs

Internally we currently have drivers for:

  • Local storage
  • Simple HTTP storage
  • Amazon S3 Storage
  • Google Cloud Storage
  • FTP

We plan on open-sourcing CloudFile and its drivers later this year and we hope the Elixir community finds it simple to implement drivers for their own cases. Ideas that come to mind are:

  • Box
  • Microsoft Cloud
  • DropBox

Stay tuned for more Urbint Engineering updates.