It’s Reboot Time for “Operating Systems”

Thanks for this! I’ve been intrigued by all of your posts and the CodeBuilder concepts you’ve been hinting at, and it’s cool to see them laid out in a prototype.

I’ve been thinking a lot about better ways to architect data platforms for enterprise bioinformatics, a field which is similar in many ways to fintech. Both have huge global reference datasets (genome sequences vs market prices), proprietary internal data, and both require lots of custom scripting and data analysis by small teams of domain experts (though not necessarily strong software engineers) to answer day-to-day questions. There is a strong preference for Python and R for custom code, but lots of work is also done on the Linux shell since most of the important data processing tools are written for the command line.

I think a browser-based IDE-like environment that allowed bash-style piping among databases, command-line tools, R/Python functions, files, and S3 objects would catch on very quickly in the bioinformatics community. It would also be transformative if it could naturally handle big data processing steps by adding them to a queue, dumping them on a cluster (I prefer Dockerized tasks on Amazon Batch) and pulling the output back into the environment. Like fintech, the upstream dependencies/lineage of data is important, so viewing and manipulating the DAG is popular (and something that’s been implemented already in many existing platforms).

Most in my field are already very familiar with Jupyter/JupyterLab, RStudio, or Spyder, in which you can directly write code and inspect variables in a running REPL. My interpretation is that a CodeBuilder is something similar, except with deeper integration across resources and languages, and all functions, variables, and data can be shared globally across multiple developers via a shared kernel/datastore. If I’m on the right track, then I could see this as an incredibly useful platform.

How do I try it out or learn more? Or, alternatively, is there a way to immediately get some of the functionality using wrappers and connectors (e.g., Python decorators, DSLs) rather than have to wait for a ground-up rewrite of the operating system?