Making FP and Type Theory Practical

FOOL’s errand: let’s build a real FP language from scratch (in Haskell)

Part 1: Introduction — The Why; early compiler architecture; Nat, Bool, summation, equality; “Core List Machine” introduction

Anton Antich
Superstring Theory

--

This is the first article in a series describing the creation of FOOL3: “functional object-oriented low-level language” — a functional programming language with a powerful type system (as close to Calculus of Constructions as we can get without making our heads spin) that compiles to .Net and JavaScript as initial targets. The series assumes some familiarity with Haskell, lambda calculus and Type Theory exposure helps as well. The complete source code for this step is available here: https://github.com/superstringsoftware/fool3/tree/tutorial01

Why another language?

Because we love Haskell, through this love we came to love Type Theory, and we do believe that the industry may benefit from the (possibly foolish) “design from the first principles” approach to build something robust and beautiful that all the modern FP and Type Theory research and experience of using and developing Haskell can teach us. At the same time, we do need something with a hopefully less steep learning curve, that is built on the foundation of industry-standard technologies (hence .Net and JavaScript as compilation targets) and that takes some great principles such as “write once run anywhere” and DRY to the next level.

In the end, we want to be able to write something like:

// COMMON:
type Post = { title, body : String }
type User = { name:String, age:Int, role:String}
authorship:Relation = One2Many (User,Post)

// SERVER:
db = buildDatabase([Post,User], [authorship], DBTypes.Mongo)

currentUser : Entity(User) = loggedIn(db.User)
newPost : Post = Post {"Hi", "Hello world"}

tr = [
enp:Entity(Post) = db.Post.add(newPost),
db.setRelation(authorship, currentUser, enp}]

db.executeTransaction(tr)

// CLIENT:

posts = GVM.subscribe(Post, srvData)

newPostForm = buildForm(Post,
Button("Create Post", onClick = posts.add(fromForm(newPostForm))),
visualOptions)

… and have the Type System and the compiler take care of the proof of correctness, of the meta-programming of the relations between entities, CRUD operations with various databases, of automatic creation of the UI forms both in .Net and on the web (as well as other compilation targets eventually), of type-safe data synchronization between computational nodes etc — to get a fully-functional and mathematically correct (:)) blogging app in under 100 lines of code. To achieve this we need a type system that is much more powerful than even what Haskell provides, but we shall definitely reuse lots of ideas that have already been tried elsewhere.

Sounds crazy and too good to be true? Well, maybe, maybe not — let’s start and see where it gets us, shall we? :) If anything, by following this series you may learn some new things about Haskell and experiment together with us.

1. First iteration language

In the first iteration, code for which you can download here: https://github.com/superstringsoftware/fool3/tree/tutorial01, our language is represented by the following “library” (see base.fool file) that defines two types — Bool and Nat — and three functions that work on them (in fact you are free to define new types and functions using the example self-explanatory syntaxis given below — how about defining your own multiplication function on the Nat type? Or adding Maybe and List types and functions on them? They should all work now out of box!)

type Nat = {
Z,
Succ (n:Nat)
};

type Bool = { True, False };

function eq(x:Nat,y:Nat):Bool = {
{Z,Z} -> True,
{Z,n} -> False,
{n,Z} -> False,
{Succ(m),Succ(n)} -> eq(m,n)
};

function plus (x:Nat,y:Nat):Nat = {
{Z, n} -> n,
{Succ(n), m} -> Succ (plus(n, m))
};

function not (b:Bool) : Bool = {
{True} -> False,
{False} -> True
};

This minimal definition, deceptively simple, requires us to build a full “MVP” for the compiler and the interpreter environment, starting with the parser, internal representation of the surface language, core language, transformation passes, etc. The only thing that is missing from this initial iteration is type checking —as it is the most complex part we will postpone its introduction till a later point. This MVP will serve as a foundation for our compiler/interpreter that we will refine, improve, and optimize over the course of the development of the series. We follow the “release early-release often” philosophy, or “innovate and iterate”, as we used to say at Veeam.com — which means the code is by no means neat, beautiful and optimized as you may have come to expect from the well-maintained Haskell libraries. Here, we will experiment, test hypothesis, try different approaches as we build FOOL up in our foolish endeavour, so the code quality will vary widely and will leave lots of room for optimization and refactoring as the architecture solidifies.

As you can guess from the code above, our FOOL3 core library currently supports the usual Product and Sum types, e.g. definition of Nat is more or less equal to the following Haskell code: data Nat = Z | Succ Nat . Function definitions support pattern matching, which was the most difficult part to implement in the current version.

Once you download the code, build it with stack buildand run the environment, you can do things like:

λfool3. one = Succ(Z)
Received expressions:
Binding (Var {name = "one", typ = UNDEFINED, val = App (Id "Succ") [Id "Z"]})
CLM:
CLMBIND "one" (CLMCON (ConsTag "Succ" 1) [CLMCON (ConsTag "Z" 0) []])
Added binding one = Succ(Z)
λfool3. three = Succ(Succ(Succ(Z)))
...
λfool3. plus(one,three)
trying interactive expression
CLM:
plus (one, three)
Final result:
Succ {Succ {Succ {Succ {Z {}}}}}
λfool3. eq(one,three)
trying interactive expression
CLM:
eq (one, three)
Final result:
False {}

Optionally, you can turn the tracing on by running :s trace on command and you will receive much more feedback on all the steps the interpreter currently performs. Of course, if you add your own types and functions to the base.fool file you can play with those as well.

2. Compiler architecture

On the high level, our architecture is built as follows:

  • We are using an excellent Haskeline terminal library in the Main.hs as the current interface
  • The Parser is defined in Lexer.hs and Parser.hs using Parsec. We won’t spend much time diving into the parser since there are some excellent reading materials available. E.g., an absolutely outstanding Stephen Diehl’s “What I wish I knew learning Haskell”, whose Parser we took as a foundation for ours; “Realworld Haskell” book has a chapter on Parsec.
  • The Parser parses our language to the Surface language definitions given in the Core.hs file (the name is a historical error and will be corrected accordingly in the future) in the Expr data type. It is quite extensive, somewhat ugly (e.g., GADTs could add some additional type safety, which we may do in the future), but allows us to represent everything we need with a foundation to capture type classes/type families, primitive types, Pi-types etc in the next iterations. The module also contains some helper functions, Expr tree traversals, pretty printing etc.
  • The compiler pipeline is defined in the Pipeline.hs and includes several passes: 1) initial clean up of the parsing and build up of the monadic environment (more on it below), where we store all the bindings for Types, Constructors and Functions 2) analysis and optimization of the case-statements inside function definitions — huge and ugly function with several levels of recursion, which will need quite a bit of refactoring work in the future, but it works for now :) 3) conversion to the CLM — “core list machine” — language, which is inspired by the Haskell’s GHC STG (described in some detail in our previous post on compiling Haskell to .Net) and has its’ own section below in this post 4) eventually, type-checking, a bunch of optimization passes, and finally compilation to the different targets. This last point is left for future posts — in the next one we will get to compilation and some basic type-checking, then will improve gradually as we are adding language features.
  • State.hs defines our top-level monad in which we operate that holds the state of our loaded module, interpreter, logs, etc:
-- structure keeping our current environment

data Environment = Environment {
-- Map that keeps all our TypeReps in the current environment
types :: NameMap Expr,
constructors :: NameMap (Lambda, Int),
-- constructors are stored with their number inside the sum type
topLambdas :: NameMap Lambda,
topBindings :: NameMap Var,
clmLambdas :: NameMap CLMLam,
clmBindings :: NameMap CLMVar,
outProgram :: NameMap String
} deriving Show

data InterpreterState = InterpreterState {
currentFlags :: CurrentFlags,
-- this is being filled by the parser as we go, so last line in the file will be first here!
parsedModule :: LTProgram,
currentSource :: Text,
currentEnvironment :: Environment
} deriving Show

type IntState = StateT InterpreterState LambdaLoggerMonad
  • ^^^ it’s a State transformer on top of another utility monad — LambdaLoggerMonad — that sits on top of IO and provides additional logging etc capabilities, you can dig into it in the Util folder, but it’s not essential to what we are trying to do here. If you don’t understand Monads — I recommend you read my “Magical Haskell” book, which gently builds up the Haskell typeclass hierarchy from the first principles in easy terms, but this series is probably too early for you :)
  • Interpreter.hs is a very basic quick and dirty interpreter for the core CLM language that provides some interactivity inside the environment with optional tracing of the interpretation steps. We won’t spend much time with it either as our end goal is compilation, which we will get to in the next iteration for both .Net and javascript platforms.
  • CLM.hs tentatively stands for “Core List Machine” and is a definition of our Core language, where everything gets compiled eventually for type-checking and optimizations, and from which everything gets compiled to various targets, .Net and JavaScript initially. It was heavily influenced by GHC Spineless-Tagless-G-machine and our work on compiling Haskell to .Net for proprietary needs, but uses slightly different approach from the mainstream lambda calculus. Below is a short intro on CLM:

3. Core List Machine introduction

The name may change down the road, but “List” captures the main difference with the mainstream lambda-calculus currying approach: we are working with Tuples explicitly across the board, both for strongly typed values, for types definitions, for type classes and families eventually, for function calls, for proper (!) records, etc. Here is the current definition of the CLMExpr data type that captures CLM:

data CLMExpr = 
CLMEMPTY
| CLMERR String
| CLMID Name
| CLMLAM CLMLam
| CLMBIND Name CLMExpr
| CLMAPP CLMExpr [CLMExpr] -- saturated application first expr to the tuple of exprs
| CLMPAP CLMExpr [CLMExpr] -- partial application (When we know the types!)
| CLMCON ConsTag [CLMExpr] -- saturated constructor application, value in a sense
| CLMFieldAccess (Name, Int) CLMExpr -- accessing a field of an expr by name or number
| CLMCASE [CLMConsTagCheck] CLMExpr -- list of constructor checks that must all fold to True bound to an expr
| CLMPROG [CLMExpr] -- list of expressions, for now used for Actions but needs to change
| CLMTYPED CLMExpr CLMExpr -- in case we want to give a type to an expression
| CLMPRIMCALL -- body of the function that is a primitive call
deriving (Show, Eq)

If you are familiar with STG you will immediately notice some similarities as well as the key difference: where STG defines application ex ex1 ex2as a sort of a “tree” — App ex (App ex1 ex2)) — that is a classic lambda-calculus based currying approach, we are using a tuple explicitly — CLMAPP ex [ex1,ex2,...,exn] — which means an expression ex is applied to a tuple [ex1,...,exn] . We can still just as easily express partial application, currying, functions as first class citizens, etc, using this approach, while using List manipulation instead of traversing trees for lots of common operations and optimizations. Arguably, list manipulation comes more intuitively to people than trees and makes for easier readability and comprehension.

Different constructors represent different expressions in our language, which are explained in the comments above. Notable things that are missing now are primitive types and literal values, but they are very easy to add and we will do so in the next iteration.

Couple of points to highlight in both CLM as well as our surface language design, which are quite different from Haskell:

  • Interaction with .Net / JavaScript libraries and “real world” in general. Knowing first-hand how much of a roadblock a “dreadful” Monad concept is for imperative programmers and newcomers in general, we want to make working with mutability and “real world” as painless as possible. Which might mean taking some shortcuts, for now captured by a very simple (even though not strictly mathematically defined yet) concept of Action as a separate Function type, where Function *must* be pure, while Action can do mutability and interact with “real world”. The only rule for now is: you can call both Actions and Functions from inside Actions, but you can only call Functions from inside other Functions. That sort of captures the whole “you can’t escape a monad” point, while scaring people much less. We will still have monads though, mind you! :)
  • Since we plan to be using more-or-less Calculus of Constructions as our Type system, there is no difference in terms of expressions between “Type-level” and “value-level”, which makes things much more concise in terms of data types definition for the compiler, but of course much more complicated in terms of type inference / checking, so some trade-offs need to be made
  • Powerful type system allows us to write type-safe definitions for lots of the concepts we really wish to be able to express in Haskell, but can’t. As well as potentially provide some very elegant meta-programming solutions to quite practical issues (such as database abstraction mentioned above), not just “boring” n-sized dependent vectors. That will be our main focus going forward — practical problems solved elegantly using a powerful type system. Just as a very quick example of the first — a type-safe tuple field access function is very easy with dependent types and can be represented something like:
proj : PI (i:Nat, tup:(t1, ..., tn)) -> ti
proj (i, (a1:t1, ..., an:tn)) = ai:ti
  • ^^^ this means tuple manipulation can become first-class citizen in our language, which solves soooo much repetition issues we see in some very good Haskell libraries now, but that’s just one simple example. Databases (and persistence in general) as well as GUI — is another, much bigger, deeper, more practical and more exciting, and which we will attempt to solve going forward.

Conclusion

This is more than enough for the first introduction post — please download the code, play with it, read the text, share your feedback and ideas, which we always like to get.

In the future posts we will spend much more time explaining and developing the surface language concepts having practical issues in mind and will follow with the compiler development along the way.

Thank you for your time, your Superstring Solutions team! :)

--

--

Anton Antich
Superstring Theory

How to scale startups and do AI and functional programming. Building Integrail.ai: pragmatic AGI platform. Built Veeam from 0 to 1B in revenue in under 10 years