So, I Created a Programming Language
Deep down, you know you want to make one too
It still feels weird (and awesome) to say that I actually created a programming language. Here’s a bit of my journey, and an overview of the language design for Ethereal.
I’ll be the first to admit that I’m probably not particularly good at systems programming. I’m not an expert at language design, and I didn’t have a clue about how to create a language. What I did have was a really strong interest in making one. Compilers, Interpreters, and Languages have fascinated me for a long time and I always wanted to know how it would feel to be writing code and realizing that the code is written in my own language. (Spoiler alert: It feels GOOOD!)
The compilers and interpreters like GCC, LLVM, CPython, etc. are truly exquisite pieces of code which — even to me — are totally mind boggling. But building your own language is not impossible.
I have been writing small and esoteric languages (Alacrity Lang, ESIL) for about a year and a half now, but they have all been very limited in their capabilities, possibilities, and architecture. But this time, I wanted a (mostly) general purpose interpreted language. And so Ethereal was born.
I started working on this language from sometime in May 2019 and, over time, built a Lexer, Parser, Bytecode Generator, and a stack-based Virtual Machine for running the generated bytecode. I will be working on these components as needed as well as building various modules for it. It’s been nearly 4 months since I started implementing this language and, to be honest, it’s been an amazing journey.
Ethereal is an interpreted language which translates the given source code to bytecode and runs it on a custom stack-based Virtual Machine. Its syntax is inspired from C and Python and is built on the fundamentals of being simplistic in nature like C, while having aesthetically pleasing syntax like that of Python. Personally, I’ve always preferred using braces to define blocks of code instead of indentation, and that’s what I’ve used here. Internally, Ethereal doesn’t care about lines or indentation for reading the source code (similar to C).
As of writing this post, I would honestly not even consider Ethereal to be in beta stage right now (it’s probably somewhere in early alpha stage). Coming to beta stage will probably take a while. But for now, it supports the following features:
- Variables, Functions, Conditionals, Loops, Enums, Structures — the usual
- Member functions (in a way) for structures — functions that work for specific structures/types and are invoked using the dot operator (like in OOP languages)
- Functions and types in C++ (or C) using modules, or in Ethereal itself using structs and imports
- Function overloading — based on argument count and types for internal (module) functions; and based on argument count for functions created in Ethereal
- Dynamic typing
- Standard Library consisting of basic console and file I/O and crucial data structures — Vectors, Maps, Sets, Optionals (similar to pointers)
It supports Linux, macOS, BSD, and Android operating systems out of the box (see Git repo for more info) and there are plans for possible Windows support.
The question that I get asked most often is why I chose to build an interpreted language instead of a compiled one. I originally thought about this quite a bit. I realized that I could afford to sacrifice some of the performance that compiled languages give to make an interpreted language. If I created a compiled language, I would have most likely used LLVM APIs to create the backend of the compiler (the optimization and code generation). But for this project, I wanted to create the entirety of the implementation myself (and I didn’t want to indulge in creating assembly code just yet).
Finally, building a Bytecode Generator and Virtual Machine seemed like more fun!
Language for the interpreter
I enjoyed using C when I started programming, and when I switched to C++, there was no going back. So, for me, language choice was more or less nonexistent. I love C++ and chose it as the language for building the reference interpreter in a heartbeat. Since it inherently provides high performance (provided one writes decent code), I wouldn’t need to worry too much about my code running slow.
Also, it is far better to write the interpreter for a language in a compiled language. Interpreted languages are slower than compiled ones. Creating an interpreted language in another interpreted language (like Python, Ruby, etc) would be too staggeringly slow.
Components of the language interpreter
Let’s take a small sample program:
The language interpreter consists of 4 fundamental components:
1. Lexer — Converts the provided source code into individual, separate tokens recognizable by the language
2. Parser — Converts the sequence of tokens into a parse tree which provides a meaningful format, to the written code, in terms of Ethereal
3. Bytecode Generator — Converts the parse tree to a sequence of custom bytecode instructions
4. Virtual Machine — Executes the generated bytecode sequence to produce the final result/output (here, 15)
Personally, in ascending order, the difficulty in creating these components went as follows:
- Lexer — Rather easy
- Bytecode Generator — Mostly okay, had a hard time generating bytecode for expressions
- Parser — A bit more difficult, especially in parsing expressions, but also in general
- Virtual Machine — Took the longest, mainly due to all the internal components like memory management, variable (and scope) management, and so on.
Dynamic library modules
In this language, I also added support for dynamic modules — functions and variable types written in C++ itself. For instance, the functions like
println and operators (
-, and so on) are implemented in C++ as library functions which are then loaded by the Virtual Machine (Dynamic Library Loading). This is how (almost) all of the standard library for the language is built.
This feature allows one to create performance critical components in C++, therefore, not bottlenecking the program. Another benefit of this is the fact that one can easily create interfaces to third party C/C++ libraries (say Audio, Network, Graphics) for Ethereal, thereby, greatly extending the language.
Ethereal does not use a garbage collector. It simply doesn’t need one because all the variables are reference counted — the counter is increased when the variable is used, and is decreased when it is no longer in use. Once the counter for a variable hits zero, the variable object is deallocated.
This is a very simple architecture both in design and implementation, which provides quite consistent performance. I did implement a basic garbage collector originally, but it blocked the execution (for some milliseconds depending on program) whenever it had to free the unused objects. To improve performance, I trashed it completely and switched to reference counting.
By design of Ethereal, most of the operators (assignment
= being the notable exception) can be overloaded for any particular type (only in dynamic library modules for now). If, for example, one is creating a module for vectors, they can overload the
+= operator for appending data to vector,
+ operator for joining two vectors,
== operator for comparing two vectors, and so on.
Using this feature, the entire numeric arithmetic and boolean arithmetic core library is built. When we write something like
a + b, it actually calls a function by the name of
+ for a & b (see the Bytecode Generator output above — ID 3 to 6).
Aside from these features, there are a multitude more which can be explored by diving deeper into the language, but those are for another time.
Well, this was a bird’s eye view of my journey, as well as of the Ethereal language. I have a lot more to tell, but I’ll write a new post for that in the near future. I’ll also be creating a series going deeper in the technicalities of the language and how it all fell into place.
I hope it was an interesting read for you all. Language design is considered a very niche, nebulous, and complex topic, and — don’t get me wrong — it definitely is. The compilers and interpreters like GCC, LLVM, CPython, etc. are truly marvelous, genius, and exquisite pieces of code which, even to me, are totally mind boggling. But it is not impossible to do, and I love understanding all of it.
So if you are interested in language design and development, definitely give it a try. It is an incredible learning experience and absolute joy to work on. And don’t even get me started on the blissful feeling of writing code in your own language ❤️.
Below are the links to Ethereal and the other languages I worked on. If you’re interested, go ahead and get your hands dirty!
Thank you for reading!
Alacrity Lang: https://github.com/Electrux/Alacrity-Lang
Next Article: Architecting the Ethereal Language