It’s Reboot Time for “Operating Systems”
Return of the Categorical Abstract Machine ?
Once you move beyond Functional Programming and treat Categories as first class objects, you need a proper way to manage them. As such, our traditional notion of ‘operating system’ starts to look a little outdated. Categories are supposed to be managed in something called a Categorical Abstract Machine (CAM). But what might a CAM look like?
In recent years, the software industry has made a dramatic shift away from static/compiled programming languages to dynamic languages— at least in situations where flexibility is valued over raw performance. However, these languages introduce special challenges (such as validation) and additional infrastructure and tooling is needed as the codebase scales.
The debate of static versus dynamic languages is not new — in fact goes back to 1958 and the creation of LISP. However, many CIOs would agree the recent proliferation of languages is leaving behind an unsustainable pile of technical debt. Each new language provides one or two marginally beneficial features that are all increasingly looking like TypeScript — which raises serious questions about existing stacks. Worse, current dynamic language tooling is often adapted from compiler-driven workflow tempo and offers almost no support at enterprise scale.
Every sufficiently complex application/language/tool will either have to use Lisp or reinvent it the hard way
— Greenspun’s Tenth Rule of Programming
Bottom line, we need a major breakthrough in how we build code.
The basic widget of most code is the “function” — thanks to its roots in lambda calculus. AWS Lambda has already figured this out. But we need more than just functions. Historically, programmers assemble (or ‘compose’) functions directly via ‘calls’ but this leads to brittle code graphs that must be broken manually when trying to split functions across hardware.
Category theory is a branch of mathematics that attempts to ‘relate’ and unify concepts. However, it suffers from a bit of a paradox in that one cannot use the same terminology to describe something ‘outside’ of itself. This leads to a soup of confusing algebra even when describing fairly simple things. In the end, the reader must “go behind the matrix” in order to visualize what is really going on.
Category Theory argues that functions should not be willy-nilly chained together directly but live in special data structures called “categories” or “cats”. Categories are essentially just ‘computation graphs’ like TensorFlow but they are normally buried behind the scenes in programming languages. The benefit of categories is that the computation graph isn’t tied so closely to any particular language implementation without having to resort to third party solutions.
In Functional Programming, you aren’t allowed to create your own category from scratch — but they do provide a dedicated cat called the ‘Kleisli’. Your code gets packaged into ‘monads’ that are then attached to the Kleisli computation graph for execution (but honestly the PyTorch approach is probably much easier). We aren’t fans of Functional Programming for that reason.
Monads do play an important role. They are basically pre-packaged ways to assemble categories without having to do it from scratch — and therefore establish the backbone of many familiar computer concepts from virtual machines to programming languages to the operating system itself.
After all, the operating system software is just a bunch of “functions”! Windows Powershell was originally named “monad”.
CAMs are Back
But what if you wanted to bypass monads and somehow create Categories directly? This requires a new type of virtual machine called a Categorical Abstract Machine (CAM). The OCAML language tried to go this direction but we need something more flexible. A modern CAM is more like a special database that lets us assemble and maintain categories.
Led by Goldman Sachs’ “billion dollar secret” after the 2008 crash, NYC embarked technology buildout in Eastern Europe to address the lack of functional infrastructure for risk analytics:
Tiny Estonia has been somewhat unable to participate in this frenzy but we looked ahead at the larger picture and see NYC essentially constructing point solutions.
Down the Rabbit Hole
What Might a Modernized CAM Look Like?
Below we see a session that reminds us of a UNIX shell. Indeed, we can view the browser as a modernized ‘terminal’. When we create the object
foo, it appears much like an empty folder and after we create a member called
x we can “cd” into it (like object path navigation) and “list” the contents.
With a nod to the venerable “vi” command, we allow direct manipulation of the function
test or variable
str1. This illustrates the larger ambition behind 1960s Multics versus the bare-bones dev support we have in UNIX/Linux (for those interested, the NixOS project is also taking a functional/monad approach).
Note that in the above example, our “directory” is not a real filesystem but rather sitting in transient memory. Interesting.
As storage RAM and transient RAM converge, the line between database and traditional programming gets fuzzy. You begin to wonder if something about our tech stack is redundant.
True “persistent memory” makes little sense to the average programmer coming from a background in static languages. Low-level programs are full of brittle memory “references” to heaven-knows-what and trying to preserve them in situ is just asking for a hot (loading) mess. Moreover, complex runtime objects such as an HTTP server are usually assembled as a one-off side effect of running a von Neumann machine over a list of build instructions— and memory addresses were never intended to be primary keys. On the other hand, dynamic languages employ stable late-binding name references which (by design) tend to be more robust.
Below is a simple example of persistent memory. The
lv shortcut (list ‘save’ status) shows two variables as green/new (
y shows up first because I created it more recently for the example) until
x is saved. Later,
x shows up red to warn us of unsaved changes.
ll command here confirms that
x has actually been saved. Note how context performs the vital role as ‘root’ anchor for persistent memory pathing. Python programmers may be familiar with the idea of “pickling” or “application checkpointing” in long-running programs, but the difference here is that state is restored automatically if the computer goes down. functional programmer might view persistent memory as anathema until you remind them that in Category terms, a program is really a data structure.
So far the ideas presented are comparable to a commercial DBMS with advanced support for various language extensions, but now we are going to apply some category theory to bring a number of concepts together. Single-level memory or SLM (also called single-level store) is another innovative Multics concept later advanced by IBM in the 1970s that attempts to extend a single programming model to various devices and operating systems capabilities.
We extend the classic notion of Single-Level Memory to mean:
- Language/DSL neutrality
- System-supported functional programming (FP)
- Automatic memory mapping
foo have to loop explicitly over the set results. These are configuration settings best handled by the context.
foo may be a proxy to another underlying language implementation (e.g. either for performance or legacy bridge). Placing a “virtual” dynamic layer atop static code has been popular in NYC trading systems for a while and part of a larger enterprise architecture of functional / configuration separation. We believe this will become more mainstream with VMs like Rust or WebAssembly (which ironically takes us back to LISP).
In the above example, single-level memory handles mapping of the function
result1 to a virtual filesystem format, allowing a conventional editor to manipulate them. Although this looks like a normal filesystem, it is really a memory mapping. As we see below, the context
cloud simply appears to Atom as a traditional folder and the entities automagically map to files:
The key thing here is that edits can be bi-directional e.g. a “save” in Atom will be immediately hot-loaded into the runtime, allowing for a more immersive REPL-driven development experience. On the same theme, it is interesting to note that Google, Facebook and others are looking at filesystem in userspace (FUSE) technology to improve their code development systems.
For completeness, we show the
result1 variable is automatically rendered as a CSV file:
Things get more interesting with mapping of SLM to JSON, which can be useful for managing various configuration files.
Exploring FP a bit further, a context can optionally bind variable names to column names in the input set as shown below:
All these settings are akin to how environment variables are handled in the UNIX shell but the motivation is (1) removing noise from code to make it easier to follow what the developer is trying to do and (2) trying to be more declarative even in a conventional imperative language.
I hope this visual walkthrough shows you what a next-gen operating system e.g. Categorical Abstract Machine might look like. But we want to go much further.
Estonia has established a world class category theory team at TalTech and resurrecting the original 1960s “moonshot” called Multics.