Yet another language…(1)

Andrew
4 min readApr 5, 2024

--

I’m a programmer with a degree in maths, and a deep interest in programming languages and mathematical proof assistants. My real interest is in answering the question “what’s the bare minimum a language needs to do interesting things?”. The obvious answer is something like Lisp (or Scheme) which confer great expressiveness with a fairly small footprint. Yet their implementations are still non-trivial — the rules for procedural evaluation (lambda application, lexical or dynamic scoping?) are quite arcane and vague, the only real reference for them being the underlying code which implements them — making great use of recursive calls which can hide understanding.

But then the interesting features of a language are hidden from the users of that language. One can mitigate these things to a degree by having careful testing scripts (I’m very test-infected) but the fact remains that a lot of “magic” persists even in these small languages.

One solution is to try and implement a list-type language in something like Python and argue that the Python code is a good reference for that implementation. The problem here is that Python performs a lot of magic in the background due to it’s loose typing.

We want the reference implementation to be as short as possible in a strongly-typed language (such as Rust), and as much of the implementation to be in the new language itself — so it bootstraps itself. Is that possible ? It seems like asking a lot.

What would this new language look like ? If we have a lot of design upfront, we might add in features we don’t really need, over complicating the reference implementation. So maybe we start from something as simple as possible, adding in new features only when we absolutely have to.

One of things I’m keen to maintain are the rules for procedural application. How does the variable substitution work ? It would be very nice to have control over the scoping of variables within the new language itself.

It would also seem that we would need at the very least support for boolean logic and integer arithmetic, but starting off with these would demand that we start with some sort of type system and rules for evaluation.

So what does all this have to do with mathematical proof ? It would be nice to (ultimately) have the ability to manipulate mathematical objects and their associated proofs. The Metamath and Coq languages are good examples of languages which enable such things to be done, but they remain complex. Metamath has a hidden evaluation algorithm which the users need take on trust that it works, whilst Coq starts off with a wide array of tools and structures (for instance the Galina sub-language) which just “work” and the user must place a lot of trust in.

So we come back to my original point — can we create a new language starting from an incredibly simple starting point in which we can do interesting things in elegant ways ? This means starting from the absolute basics of language lexing and parsing, so our implementation becomes a minimal reference and we are completely sure of what we are doing.

Another solution to this problem is to design a sequence of languages, each written in the former language, with gradually increasing of complexity in each. The problem here is that there’s a lot of impedance mismatch between the languages in sequence — it would be better just to have a really simple reference language which bootstraps itself.

So let’s get started ! Let’s build a language based on simple ASCII strings and nothing else. To keep things simple, we will eschew normal procedural definitions (and their complex substitution rules) and implement a stack-based language : Commands will simply push strings onto a stack, and the result of running a program will be the state of the stack when it is over.

We will use Rust for its strict formalism for the implementation and write tests to assert that the implementation does what it says on the tin. We will also use the methodology of TDD (Test-Driven-Development) whereby our tests dictate future functionality we seek in our code.

So where to start ? Well we might begin by specifying what represents a valid program, and what represents the result of running such a program. What could be simpler than the empty string ? We would expect such a program to do nothing and return an empty stack (of strings). In the next post we still start from this very basic point.

--

--