Multiple languages in a single compute capsule

Code Ocean is intended to be easy-to-use for scientists with domain expertise, at any level of technical skills. As such, we put a lot of work into customizing base images for common use cases, and towards providing a menu of preconfigured options that meet most users’ needs. (See our help articles on Configuring your computational environment for more nuts and bolts.) Most authors, for instance, submit compute capsules that use just one programming language. But this is not always the case. I know this well because my own capsule, the Contact Hypothesis Revisited, reflects the work of a team of three, and while I worked primarily in R, one co-author wrote in Stata.

To prepare for this, Code Ocean’s tech team put in some work under the hood to make a base image that supports both languages natively, as we thought this would be a relatively common use case in the social sciences. Just as importantly, the base image needed to provide access to package managers for both languages*.

The result is a substantial reduction in the effort it takes to reproduce the paper’s results. The alternative would have been to assemble the R files and run them sequentially, then open up the Stata GUI and run the remainder of the code. For a user without a Stata license, this would have been a total non-starter.

Another multi-language capsule that was fun to help prepare was Kefei Liu’s ELMSeq: An Extended Linear Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq data. Kefei’s capsule has code in both MATLAB and R. While MATLAB’s dependency management process tends to be simple — mostly because MATLAB’s built-in functionality is fairly encompassing, and because we feature a number of toolboxes built-in — R’s package management process can be daunting for newcomers, especially when a user needs packages not available through CRAN (such as those hosted on Bioconductor).

This is where the setup script proved essential. With this feature in hand, an author has access to as much functionality as they would on a regular Linux machine. Anything run in the setup script, like anything installed via the built-in package managers, is baked into the underlying image, and does not need to be run anew by each user. For this particular capsule, the built-in package managers, which the platform dynamically detects and adds when they are installed as dependencies, were sufficient for most of the necessary packages, and then the setup script was called on to access Bioconductor.

MATLAB and Python also go together easily, as you will see in Multiresolution Alignment for Multiple Unsynchronized Audio Sequences using Sequential Monte Carlo Samplers.

If you have particular combinations of languages you’d like us to support, please do not hesitate to reach out and ask!

*If you view this capsule’s environment configuration screen, in particular its setup script, you will see that a little more work was required to make everything exactly right — installing Stata packages only available on GitHub, or that are very old, can be a bit trickier.

Explore the referenced compute capsules here:

Seth Green is the Developer Advocate for Code Ocean. He helps authors publish their code on the platform and tries to represent researchers’ points of view within Code Ocean. He spent a few years in a political science PhD program before joining Code Ocean. Find him on twitter @setgree

Originally published at