Creating a Programming Language

Architecting the Ethereal Language

The decisions I made when I designed my own programming language

Chirag Khandelwal
Young Coder
Published in
8 min readSep 10, 2019

--

Adapted from Pixabay

Recently, I wrote an article with an overview of creating a programming language from my perspective. My language is Ethereal — an experimental work-in-progress that you can check out on GitHub.

What is it for? Well, my idea is to use it for building scripts. But since Ethereal is a general purpose language, it can be used for whatever you want. And even if I don’t write anything more than a test script, I’ve learned far more on this project about the internal details of how languages work than I would playing around with someone else’s mature platform.

How did I make the key design decisions about how Ethereal should work?Let’s dive deeper.

Introduction

In this article, I’ll be writing about the features of Ethereal, their design, the choices I made, and how I went about implementing them. I will talk more about individual components in later articles, but this one will revolve around the overall language design and architecture.

I will say that I am by no means an incredible language designer/architect. But I do hope that this article provides some insight to people who want to create their own programming languages, or are just curious about them. (If that’s you, great! If that’s not you — why not reconsider?)

Language design

When I envisioned creating a programming language, I definitely did not expect it to be the new cutting-edge, ground-breaking, sky-encompassing language that would shake the very foundations of the world. Far from it. Originally, I started designing and developing the language simply as a hobby. Most importantly, I wanted to learn from it how programming languages work, how everything falls into place, and what it takes to actually write one.

It turns out that developing a programming language is not that hard, as long as you are not creating your own version of C++ or Haskell (in which case, tread very carefully).

Fundamentally, I wanted my language to have some core features:

  1. Extensibility. I didn’t want my language to be limited just to some selected instructions. I wanted it to be easily extended depending on the task.
  2. Decent Performance. Let’s not kid ourselves, an interpreted language (like Ethereal) will not have nearly as good performance as a compiled language. But still, I didn’t want my language to be sluggish. I want it to be fast enough to work on the task I want, easily. Also, it would be great to have a way to delegate some high performance tasks to a compiled language and integrate that alongside the Ethereal code.
  3. Turing Completeness. For those of you who don’t know, this means that any problem which is solvable can be solved using this language. Yes, it may be tremendously hard to do so, but it should be possible nonetheless. This would require the language to have the basic constructs for flow control like conditionals, loops, and recursion.
  4. Ease of Use. I wanted a language which can be easily used, hassle free. I didn’t want to invent some exotic syntax just for this language, nor did I expect others to learn it if I did.
  5. Modularity. I wanted the language to be modular. I shouldn’t need to write all the code in a single source file. That’s inefficient and promotes needless code duplication.
  6. Basic Stuff. All the basic constructs a language usually has — structs, functions, vectors, and maps — should be present. Vectors and Maps should also allow the use of the [] operator.

With these requirements, I started writing small sample programs with a variety of hypothetical syntax to decide which syntax would be good. I chose the ones that I found pleasing to the eye and amalgamated them.

Sample — simple program for displaying time since epoch, and time difference in microseconds

Then, I started writing out more sample programs in that amalgamated syntax and saw the limitations and annoyances in it. So I refined it. When I was satisfied with the syntax, I started working on the technical architecture of the language.

The host language

Now I had the requirements and the syntax of my planned language. It was time to fill in the implementation that would actually make the syntax work on a computer.

For writing the language, I chose C++ for two reasons:

  1. I wanted a fast compiled language. I didn’t choose C, because I didn’t want to lose some abstractions that C++ provides — vectors, strings, maps to name a few. In other words, I wanted the C++ Standard Template Library (STL). It just makes my life so much easier and lets me focus on the work that I’m doing instead of having to implement these constructs from the ground up — and I wanted as few library dependencies as possible.
  2. I love C++, and I’m quite comfortable with it. I didn’t want to learn a whole new programming language while writing one.

If you decide to write a language (please do), you can write your language in any language of your choosing. There is no hard and fast rule, as long as the language you pick is Turing Complete, or at least allows for all the required constructs to implement your language design. Heck, you can even write a programming language even in Bash (please don’t).

The C++ version question

C++ has a vast number of compilers, and the standards committee releases new versions of the C++ standard regularly. But there’s a catch. For the sake of stability (Ubuntu comes to mind), most operating systems don’t include implementations of the latest C++ standard. Therefore, if I used one of the newer C++ versions, it would drastically reduce the accessibility of my language on systems that still use older compilers. Considering that, I went with using C++11 standard, as it is available on most computers and provides a good set of features.

Feature satisfaction

After finalizing the language I was going to use, it was time to think about how I was going to implement the features I set when designing it.

This is where my previous understanding of C++ really helped. I was able to figure out quickly how the features I had thought of could be implemented:

  1. Extensibility. Using C++, I could create dynamically loaded modules (DLLs) which would contain specific extensions for the language (plugins) and would be loaded on demand when the source code required it. This is accomplished by using shared objects in C++ for plugin functionality.
  2. Performance. Well, unless we are talking about C or Assembly, I don’t know any other language which is as fast as C++ (and available on so many platforms). Also, with the use of C++ plugins (DLLs), I could easily write high performance parts in C++ and use them directly in Ethereal.
  3. Turing Completeness. There is no going around it. I would have to implement all the required control structures — which I wanted to do anyway — to make the language general purpose.
  4. Ease of Use. Overall, I chose the syntax similar to that of C and Python. I have a dislike of Python’s indent-based blocks and the inability to write one-liners. I wanted that in my language, so I used brace delimited blocks like C, semicolons for statement termination, and completely discarded the need for indentation (although I implore people to write properly indented code for readability).
  5. Modularity. I made the language modular by enabling other source files to be imported into each other. You can simply create a feature in one source file and import it in your main source.
  6. Basic stuff. All the required constructs are implemented and I created a special subscript function [] for enabling the extension of the subscript operator by types other than Vectors and Maps, without having to hard-code it in the language.

I also thought about an important question…

To OOP or not to OOP

I’ll be honest. I have a love/hate relationship with object-oriented programming. On one hand, it gives you a helpful way (classes) to put data and the related function in a single unit. Inheritance and polymorphism have their uses too — I’ve even used them quite a bit in the Ethereal language codebase.

On the other hand, OOP makes the code look much more complicated and can be overkill for smaller things. I don’t want to write a full-blown class-object program for a simple script!

To that end, I thought of the most awesome feature I like about OOP — the member functions of course! I decided to implement them, but discard all the other OOP features. (Alan Kay, don’t @ me.)

Even the design of member functions was easy. I just bound simple functions to specific types and allowed calling those functions from the respective type variables. The functions could get a reference to the calling type variable with the self keyword. This also made them very efficient, since there was no complex computation going on behind the scenes.

That’s literally it. This enabled me to use the elegant syntax of member functions, while not having to implement OOP.

Fun stuff: Some Ethereal code

At this point, you may be wondering about how I use the language and eager to see a tiny bit of what it can do. Well, here is a piece of code:

perform_tests.et — Execute all tests and see if they pass or fail

This code (located in the main repo directory) is the test driver for executing all the tests in the tests/ directory in the source code tree. It produces a count of passed and failed tests, along with how much time (in milliseconds) it took to execute each of those tests, and then exits using the number of failed tests as the exit code.

This is helpful because if the exit code is not 0 (i.e. failed count is not 0), continuous integration systems (like Travis and Cirrus) will mark the build as failed.

In essence, this code first finds the list of files in the tests directory with .et extension, initiates a counter of passed and failed tests, loops through the list of files, and executes them. It shows the duration it takes for the execution of each one. Finally, it shows the OS, number of passed/failed tests, and the total time it took for the execution.

Here is a screenshot of the execution.

Executing perform_tests.et

Conclusion

That’s it for this article. I honestly had a hard time deciding what to put in and what not to include because of the vast amount of possible content. I hope it’s informative and not too overwhelming.

In the upcoming articles, I’ll be writing about the individual components and features in more detail. I still have a lot to discuss about the language.

I hope you liked the article. I love to get feedback regarding the articles and the language itself, so be sure to send them my way. And if you’re planning to write a language of your own (or have already done it), drop a comment to let me know!

Thanks a bunch for reading, and until next time! ❤️

--

--

Enjoys developing programming languages, video games, and computer science in general.