It’s Reboot Time for “Operating Systems”

Return of the Categorical Abstract Machine ?

reinman
reinman
Oct 16, 2017 · 8 min read

Once you move beyond Functional Programming and treat Categories as first class objects, you need a proper way to manage them. As such, our traditional notion of ‘operating system’ starts to look a little outdated. Categories are supposed to be managed in something called a Categorical Abstract Machine (CAM). But what might a CAM look like?

Image for post
Image for post

In recent years, the software industry has made a dramatic shift away from static/compiled programming languages to dynamic languages— at least in situations where flexibility is valued over raw performance. However, these languages introduce special challenges (such as validation) and additional infrastructure and tooling is needed as the codebase scales.

Image for post
Image for post
IT legacy nightmare

The debate of static versus dynamic languages is not new — in fact goes back to 1958 and the creation of LISP. However, many CIOs would agree the recent proliferation of languages is leaving behind an unsustainable pile of technical debt. Each new language provides one or two marginally beneficial features that are all increasingly looking like TypeScript — which raises serious questions about existing stacks. Worse, current dynamic language tooling is often adapted from compiler-driven workflow tempo and offers almost no support at enterprise scale.

Every sufficiently complex application/language/tool will either have to use Lisp or reinvent it the hard way

— Greenspun’s Tenth Rule of Programming

Bottom line, we need a major breakthrough in how we build code.

The basic widget of most code is the “function” — thanks to its roots in lambda calculus. AWS Lambda has already figured this out. But we need more than just functions. Historically, programmers assemble (or ‘compose’) functions directly via ‘calls’ but this leads to brittle code graphs that must be broken manually when trying to split functions across hardware.

Category theory is a branch of mathematics that attempts to ‘relate’ and unify concepts. However, it suffers from a bit of a paradox in that one cannot use the same terminology to describe something ‘outside’ of itself. This leads to a soup of confusing algebra even when describing fairly simple things. In the end, the reader must “go behind the matrix” in order to visualize what is really going on.

Category Theory argues that functions should not be willy-nilly chained together directly but live in special data structures called “categories” or “cats”. Categories are essentially just ‘computation graphs’ like TensorFlow but they are normally buried behind the scenes in programming languages. The benefit of categories is that the computation graph isn’t tied so closely to any particular language implementation without having to resort to third party solutions.

In Functional Programming, you aren’t allowed to create your own category from scratch — but they do provide a dedicated cat called the ‘Kleisli’. Your code gets packaged into ‘monads’ that are then attached to the Kleisli computation graph for execution (but honestly the PyTorch approach is probably much easier). We aren’t fans of Functional Programming for that reason.

Image for post
Image for post
Your function graph is a category

Monads do play an important role. They are basically pre-packaged ways to assemble categories without having to do it from scratch — and therefore establish the backbone of many familiar computer concepts from virtual machines to programming languages to the operating system itself.

After all, the operating system software is just a bunch of “functions”! Windows Powershell was originally named “monad”.

But what if you wanted to bypass monads and somehow create Categories directly? This requires a new type of virtual machine called a Categorical Abstract Machine (CAM). The OCAML language tried to go this direction but we need something more flexible. A modern CAM is more like a special database that lets us assemble and maintain categories.

Led by Goldman Sachs’ “billion dollar secret” after the 2008 crash, NYC embarked technology buildout in Eastern Europe to address the lack of functional infrastructure for risk analytics:

Image for post
Image for post

Tiny Estonia has been somewhat unable to participate in this frenzy but we looked ahead at the larger picture and see NYC essentially constructing point solutions.

Image for post
Image for post
Turn back while you still can

Below we see a session that reminds us of a UNIX shell. Indeed, we can view the browser as a modernized ‘terminal’. When we create the object foo, it appears much like an empty folder and after we create a member called x we can “cd” into it (like object path navigation) and “list” the contents.

Image for post
Image for post
Context

With a nod to the venerable “vi” command, we allow direct manipulation of the function test or variable str1. This illustrates the larger ambition behind 1960s Multics versus the bare-bones dev support we have in UNIX/Linux (for those interested, the NixOS project is also taking a functional/monad approach).

Note that in the above example, our “directory” is not a real filesystem but rather sitting in transient memory. Interesting.

As storage RAM and transient RAM converge, the line between database and traditional programming gets fuzzy. You begin to wonder if something about our tech stack is redundant.

True “persistent memory” makes little sense to the average programmer coming from a background in static languages. Low-level programs are full of brittle memory “references” to heaven-knows-what and trying to preserve them in situ is just asking for a hot (loading) mess. Moreover, complex runtime objects such as an HTTP server are usually assembled as a one-off side effect of running a von Neumann machine over a list of build instructions— and memory addresses were never intended to be primary keys. On the other hand, dynamic languages employ stable late-binding name references which (by design) tend to be more robust.

Below is a simple example of persistent memory. The lv shortcut (list ‘save’ status) shows two variables as green/new (y shows up first because I created it more recently for the example) until x is saved. Later, x shows up red to warn us of unsaved changes.

Image for post
Image for post
Persistent memory

The ll command here confirms that x has actually been saved. Note how context performs the vital role as ‘root’ anchor for persistent memory pathing. Python programmers may be familiar with the idea of “pickling” or “application checkpointing” in long-running programs, but the difference here is that state is restored automatically if the computer goes down. functional programmer might view persistent memory as anathema until you remind them that in Category terms, a program is really a data structure.

So far the ideas presented are comparable to a commercial DBMS with advanced support for various language extensions, but now we are going to apply some category theory to bring a number of concepts together. Single-level memory or SLM (also called single-level store) is another innovative Multics concept later advanced by IBM in the 1970s that attempts to extend a single programming model to various devices and operating systems capabilities.

We extend the classic notion of Single-Level Memory to mean:

  • Language/DSL neutrality
  • System-supported functional programming (FP)
  • Automatic memory mapping

In the spirit of Categories (without diving into monads, Kan operations and other exotica) no function or DSL is “subordinate” to another e.g. reduced to embedded strings. The below example shows both SQL and JavaScript treated as peers:

Image for post
Image for post
Single-level memory

Combining statements across languages is where Categories fits naturally — we need to map SQL datatypes to JavaScript. However the code was not overtly imperative in that it did not specify which database we are talking to, nor how the FP should be handled (sets vs scalar, sync vs async etc.). Nor did the function foo have to loop explicitly over the set results. These are configuration settings best handled by the context.

Note that foo may be a proxy to another underlying language implementation (e.g. either for performance or legacy bridge). Placing a “virtual” dynamic layer atop static code has been popular in NYC trading systems for a while and part of a larger enterprise architecture of functional / configuration separation. We believe this will become more mainstream with VMs like Rust or WebAssembly (which ironically takes us back to LISP).

In the above example, single-level memory handles mapping of the function foo and result1 to a virtual filesystem format, allowing a conventional editor to manipulate them. Although this looks like a normal filesystem, it is really a memory mapping. As we see below, the context cloud simply appears to Atom as a traditional folder and the entities automagically map to files:

Image for post
Image for post

The key thing here is that edits can be bi-directional e.g. a “save” in Atom will be immediately hot-loaded into the runtime, allowing for a more immersive REPL-driven development experience. On the same theme, it is interesting to note that Google, Facebook and others are looking at filesystem in userspace (FUSE) technology to improve their code development systems.

For completeness, we show the result1 variable is automatically rendered as a CSV file:

Image for post
Image for post

Things get more interesting with mapping of SLM to JSON, which can be useful for managing various configuration files.

Exploring FP a bit further, a context can optionally bind variable names to column names in the input set as shown below:

Image for post
Image for post

All these settings are akin to how environment variables are handled in the UNIX shell but the motivation is (1) removing noise from code to make it easier to follow what the developer is trying to do and (2) trying to be more declarative even in a conventional imperative language.

I hope this visual walkthrough shows you what a next-gen operating system e.g. Categorical Abstract Machine might look like. But we want to go much further.


Estonia has established a world class category theory team at TalTech and resurrecting the original 1960s “moonshot” called Multics.

Image for post
Image for post

Image for post
Image for post
h/t NYC

👋🏾 Get to know the people and ideas shaping the products we use every day. Subscribe to Noteworthy — the product & design newsletter written by the Journal team.

Image for post
Image for post

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store