How to Design Your Own Programming Language | Sea #0

David MacDonald
CodeX
Published in
15 min readJul 24, 2021
Photo by Danial Igdery on Unsplash

Welcome! This is the first article in a series I plan to write involving my programming language — Sea. In these articles, I plan on documenting my process for writing the language to serve a handful of main goals:

  • To teach you the process of creating a programming language
  • To help me learn the process better myself and develop Sea
  • To create documentation for the design choices made for Sea
  • To take the time to properly design Sea

Note: My experience with writing programming languages comes from personal experience while working on Sea and CodePulse’s Programming Language Tutorial. There is a lot of really great material in that tutorial series, however the code is atrocious; there are a handful of files hundreds of lines long. So, I still believe my series of articles will be useful. I will be saving them to my Programming Language List.

This series is also an extension of my article A Basic History of Programming Languages and how to Create Your Own. In that article, I go through the historical development of programming languages and explain the general process of creating a programming language conceptually. In this series, I will go through each step in practical detail. You can copy what I do exactly, or use it to create your own language. Note that I will be adding a fourth step to my previously described three steps: preprocessing. This is because C has a pre-processor, and it offers a handful of benefits. For now, though, I will be ignoring the pre-processor until an article at a later date.

Photo by Edho Pratama on Unsplash

Purpose and Design Philosophy

The idea for Sea is a simple one. I’ve spent a lot of time programming in numerous languages, but my personal favourite language is Python. I spent a while almost elitist thinking that “Python is a beginners’ language. Brackets aren’t scary; I’m fine with Java and C++”. However, once I started really getting into Python, I fell in love. Python is simply a pleasure to write in due to the simple syntax. It is much easier to debug, and it is overall easy to read the code. I think pushing computer languages towards conversational language is ideal.

Python is a great language, but it’s a high-level programming language. You can use lower level features with Cython, but that’s more of a hack than a feature, as far as I’m concerned. There simply aren’t convenient and obvious ways to use something like pointers in Python as efficiently as in C. I’m also heavily aware that not every language should be used for everything. A good developer does not simply choose their favourite language and use it for web development, game development, and app development; a good web developer uses the tool that is best for the job at hand. So wouldn’t it be nice if there were a language that were able to do everything; it would be a modular language that can function very lightly for low-level applications such as operating systems, it would have high-level features that allow it to be used for web development and general app development, and lastly, it would need to have simple and readable syntax.

That is my dream language. However, I thought that would be too large a task to tackle as my first language. So, I began to think smaller and came up with Sea — it’s C, just written differently. The goal is to create a version of C with syntax similar to Python, without adding a significant runtime cost. I prefer indent-syntax over bracket-syntax in most scenarios. Good code should be indented and spaced anyway, so making it required is not a horrible idea. The primary focus should be on readability. The way I will achieve this with Sea is by pythonifying the general syntax of C and then creating syntax that is sugar; syntax that is solely a nicer way of doing something you can already do.

The purpose of Sea is to be a beginner-friendly version of C. It should be as powerful as C, without a significant runtime cost. As such, there should be syntactic sugar used to mimic certain higher level functionality. Doing things this way will add a compilation cost to Sea, but not a significant runtime cost. Sea will need to strike a balance between Python and C. With these considerations in mind, we can begin designing the language.

Photo by Ryan Wallace on Unsplash

Syntax and General Grammar

Creating the grammar for a language is, in my opinion, the hardest part of the process. There are simply so many parts involved in a language that it can hard to identify them all. I think our best option here is to start with a language you know, and modify it as you go along. Since Sea really is just a redesign of C, the grammar can be mostly the same. The Syntax will mostly be derived from Python, however. So, I am going to reference Microsoft’s C Language Syntax Summary. Within this, there is an in-depth grammatical model for C. I will be mimicking their Definitions and Conventions to an extent. I recommend reading the page (it’s really short). Since Medium doesn’t have subscripts, I will be using the regex “?” for optional components.

Please note that as I write all of this, I haven’t written Sea completely yet. Meaning, I might not Pythonify all of the C grammar. So, I will likely have to come back and modify this article as I develop Sea.

Tokens

Tokens are the basis of our language. Every syntax element will be expressed as a token. Our tokens will be one of the following types: keyword, identifier, constant, string-literal, and punctuator.

keyword: All the reserved keywords in Sea such as “if”, “else”, “struct”, etc.

identifier: All user-entered names for things such as variables, functions, and structs. These are made up of underscores, English letters, and digits. However, an identifier must not begin with a digit.

constant: These are all constant data that cannot be modified and are known at compile time. They are further divided into integer-constant, floating-constant, enumeration-constant, character-constant.

  • integer-constant: For our purposes, all integers will be decimal and composed of decimal digits. I will implement different number bases at a later date.
  • floating-constant: These will be decimal for the time being and composed of decimal digits and, at most, one decimal place. I will implement different number bases as well as scientific notation at a later date.
  • enumeration-constant: These will be identifiers that represent enums in Sea.
  • character-constant: Any character in Sea. They will be a single character within single quotes, or the character may be prefixed with a backslash.

string-literal: All strings or implicit arrays of characters. I will add different encoding types and fancy string types such as Python-style f-strings.

punctuator: All the general syntax symbols. These include the operators and general “punctuation”. In Sea, indents and newlines will be punctuators.

These are our tokens for Sea! Your language may have different tokens or may end up representing these tokens in different ways, but that’s alright. Now, let’s see how these tokens will be used to construct the grammar of our language.

Photo by Brett Jordan on Unsplash

Grammar

Recall that the grammar of a programming language defines how tokens should be used together to create order or an understanding. An important thing to note is that, since Sea uses Python-style indentation, we will not be ignoring all whitespace like in many languages; however, we will be ignoring individual whitespace characters, if they are not newlines, after the first non-whitespace character. We will also ignore empty newlines that serve only to make the code more readable. Our grammar will be composed of expressions, declarations, statements, and external definitions.

Expressions

Expressions are lines of code that resolve to a value. Prepare yourself because the following grammar might be overwhelming. These expressions will often be recursive as well. The order of operations is from top to bottom between types of expressions. However, each element of an expression has the same priority as the others of that same element. This is just how in PEMDAS (or BODMAS), parentheses have the highest order followed by exponents followed by multiplication and division. Many people get this wrong, but multiplication and division have the same priority and must be performed from left to right. Lastly, addition and subtraction are done with the least priority, but they have the same priority as one another. This has to be done cautiously in a programming language, however. With recursive definitions of grammar, it is important to search for the more complex definition first. Otherwise, it may only ever detect the simplest definition.

primary-expression: The absolute basis for all expressions. Can take the following forms:

  • identifier
  • constant
  • string-literal
  • ( expression )
  • generic-selection

generic-selection: A mapping for a function that allows for generic types.

  • generic ( assignment-expression , generic-assoc-list )

generic-assoc-list: A list of associations to define a generic function.

  • generic-association
  • generic-assoc-list , generic-asociation

generic-association: An association between type and code for a function.

  • type-name : assignment-expression
  • default : assignment-expression

postfix-expression: An expression involving a postfix operator.

  • primary-expression
  • cast-expression
  • postfix-expression [ expression ]
  • postfix-expression ( argument-expression-list? )
  • postfix-expression . identifier
  • postfix-expression -> identifier
  • postfix-expression ++
  • postfix-expression --
  • ( type-name ) { initializer-list }
  • ( type-name ) { initializer-list , }

cast-expression: An expression that converts from one type to another. Note that my casting is in the style of Python’s function-call casting. However, it will behave more like C’s casting. Nonetheless, I am keeping it at the same priority as a function call.

  • postfix-expression
  • type-name ( cast-expression )

argument-expression-list: A list of arguments.

  • assignment-expression
  • argument-expression-list , assignment-expression

I’d like to pause for a moment to let everyone catch their breath and to address a few things. For the most part, all of this is straight from C. Since I haven’t written Sea yet, it’s likely some of this grammar is too C-like and I will have to modify it to make it more Python-like. When I do this, I will try to come back and update this article. Keep in mind once again that your language might not need all of this. I think starting from scratch, which is what I did, isn’t that ideal of a strategy when it comes to language grammar. I recommend you continue to follow this Sea grammar, figure out what each part means, and determine whether your language needs it or not. Shrink down the list and change symbols to fit your language, then attempt to add new elements when required. Also, don’t feel bad if you need to read this over many times to understand it. As I’m writing this, even I have to do that. Stick with it and you will endure.

exponential-expression: An expression that performs exponentiation.

  • postfix-expression
  • exponential-expression ** postfix-expression

unary-expression: An expression containing a unary operator.

  • exponential-expression
  • ++ unary-expression
  • -- unary-expression
  • unary-operator exponential-expression
  • size of unary-expression
  • size of ( type-name ) align of ( type-name )

unary-operator: An operator with only one operand. One of &, *, +, -, or ~.

multiplicative-expression: An expression that performs multiplicative operations.

  • unary-expression
  • multiplicative-expression * unary-expression
  • multiplicative-expression / unary-expression
  • multiplicative-expression % unary-expression

additive-expression: An expression that performs additive operations.

  • multiplicative-expression
  • additive-expression + multiplicative-expression
  • additive-expression - multiplicative-expression

shift-expression: An expression that performs bitwise shift operations.

  • additive-expression
  • shift-expression << additive-expression
  • shift-expression >> additive-expression

bitwise-and-expression: An expression that performs bitwise and.

  • shift-expression
  • bitwise-and-expression & shift-expression

bitwise-xor-expression: An expression that performs bitwise xor.

  • bitwise-and-expression
  • bitwise-xor-expression ^ bitwise-and-expression

bitwise-or-expression: An expression that performs bitwise or.

  • bitwise-xor-expression
  • bitwise-or-expression | bitwise-xor-expression

relational-expression: An expression that performs a relational comparison. Note that these are Python-style relational comparisons.

  • bitwise-or-expression
  • relational-expression < bitwise-or-expression
  • relational-expression > bitwise-or-expression
  • relational-expression <= bitwise-or-expression
  • relational-expression >= bitwise-or-expression
  • relational-expression == bitwise-or-expression
  • relational-expression != bitwise-or-expression

logical-not-expression: An expression that performs logical not.

  • relational-expression
  • not logical-not-expression

logical-and-expression: An expression that performs logical and.

  • logical-not-expression
  • logical-and-expression and logical-not-expression

logical-or-expression: An expression that performs logical or.

  • logical-and-expression
  • logical-or-expression or logical-and-expression

conditional-expression: An expression that performs the condition operation.

  • logical-or-expression
  • expression if logical-or-expression else conditional-expression

assignment-expression: An expression that performs an assignment operation.

  • conditional-expression
  • unary-expression assignment-operator assignment-expression

assignment-operator: An operator that assigns a value. One of =, **=, *=, /=, %=, +=, -=, <<=, >>=, &=, ^=, or |=.

expression: Code that resolves to a value.

  • assignment-expression
  • expression , assignment-expression

constant-expression: An expression that has a value calculable at compile time.

  • conditional-expression
Photo by Nathan Dumlao on Unsplash

Alright! Congratulations on getting this far! We’ve managed to lay out all of our expressions. You may notice that I am primarily following C’s grammar, but making modifications that follow Python’s grammar. I think the second half of grammar is much easier to follow, since they are all of the familiar operators. Don’t panic, but we’re only one third of the way through Sea’s grammar.

Declarations

A declaration is code that specifies the nature of an identifier. These allow for creation of variables, structs, enums, etc. Note that I’m finding it hard to label all of these in a way that adds to understanding, so I will only label when I have something to add.

declaration:

  • declaration-specifiers init-declarator-list? \n
  • static_assert-declaration

declaration-specifiers:

  • storage-class-specifier declaration-specifiers?
  • using declaration-specifiers as declaration-specifiers
  • type-specifier declaration-specifiers?
  • type-qualifier declaration-specifiers?
  • function-specifier declaration-specifiers?
  • alignment-specifier declaration-specifiers?
  • align declaration-specifiers as declaration-specifiers

init-declarator-list:

  • init-declarator
  • init-declarator-list , init-declarator

init-declarator:

  • declarator
  • declarator = initializer

storage-class-specifier:

  • auto
  • external
  • register
  • static

type-specifier:

  • void
  • bool
  • char
  • short
  • int
  • long
  • float
  • double
  • signed
  • unsigned
  • complex
  • atomic-type-specifier
  • struct-or-union-specifier
  • enum-specifier
  • typedef-name

struct-or-union-specifier:

  • struct-or-union identifier? : struct-declaration-block \n
  • struct-or-union identifier

struct-or-union:

  • struct
  • union

struct-declaration-block:

  • struct-declaration
  • struct-declaration-block? \n \t indent-level struct-declaration

struct-declaration:

  • specifier-qualifier-list struct-declarator-list?
  • static_assert-declaration

indent-level: A collection of tabs that represents how deeply nested the current code is.

struct-qualifier-list:

  • type-specifier specifier-qualifier-list?
  • type-qualifier specifier-qualifier-list?
  • alignment-specifier specifier-qualifier-list?

struct-declarator-list:

  • struct-declarator
  • struct-declarator-list , struct-declarator

struct-declarator:

  • declarator
  • declarator? : constant-expression

enum-specifier:

  • enum identifier? : enumerator-block \n
  • enum identifier

enumerator-block:

  • enumerator
  • enumerator-block? \n \t indent-level enumerator

enumerator:

  • enumeration-constant
  • enumeration-constant = constant-expression

atomic-type-specifier:

  • atomic ( type-name )

type-qualifier:

  • const
  • restrict
  • volatile
  • atomic

function-specifier:

  • inline
  • returnless

declarator:

  • pointer? direct-declarator

direct-declarator:

  • identifier
  • ( declarator )
  • direct-declarator [ type-qualifier-list? assignment-expression? ]
  • direct-declarator [ static type-qualifier-list? assignment-expression ]
  • direct-declarator [ type-qualifier-list static assignment-expression ]
  • direct-declarator [ type-qualifier-list? * ]
  • direct-declarator ( parameter-type-list )

pointer:

  • * type-qualifier-list?
  • * type-qualifier-list? pointer

type-qualifier-list:

  • type-qualifier
  • type-qualifier-list type-qualifier

parameter-type-list:

  • parameter-list
  • parameter-list , ...

parameter-list:

  • parameter-declaration
  • parameter-list , parameter-declaration

parameter-declaration:

  • declaration-specifiers declarator
  • declaration-specifiers abstract-declarator?

type-name:

  • specifier-qualifier-list abstract-declarator?

abstract-declarator:

  • pointer
  • pointer? direct-abstract-declarator

direct-abstract-declarator:

  • ( abstract-declarator )
  • direct-abstract-declarator [ type-qualifier-list? assignment-expression? ]
  • direct-abstract-declarator [ static type-qualifier-list? assignment-expression ]
  • direct-abstract-declarator [ type-qualifier-list static assignment-expression ]
  • direct-abstract-declarator [ type-qualifier-list? * ]
  • direct-abstract-declarator ( parameter-type-list? )

typedef-name:

  • identifier

initializer:

  • assignment-expression
  • { initializer-list }
  • { initializer-list , }

initializer-list:

  • designation? initializer
  • initializer , designation? initializer

designation:

  • designator-list =

designator-list:

  • designator
  • designator-list designator

designator:

  • [ constant-expression ]
  • . identifier

static_assert-declaration:

  • static assert ( constant-expression , string-literal ) \n
Photo by Lukasz Szmigiel on Unsplash

Phew we’re through that! Let’s take a moment to enjoy this beautiful image.

Alright. I must say, that was much harder for me than the expressions. I found it to be much less familiar and much longer. I’m certain I made spelling mistakes, (if you find those please point them out for me) and I’m sure I failed to properly Pythonify a lot of C’s grammar. I will do my best to continue to come back and update it as I find mistakes. Now, all we have left are statements and external definitions. Luckily, these are the shortest sections.

Statements

A statement is the unit for code performing work. Each line of code is often a single statement.

statement:

  • indent-level labeled-statement
  • indent-level compound-statement
  • indent-level expression-statement
  • indent-level selection-statement
  • indent-level iteration-statement
  • indent-level jump-statement
  • indent-level pass

jump-statement:

  • goto-statement
  • continue-or-break-statement
  • return-statement

goto-statement:

  • go to identifier \n
  • go to identifier if expression \n

continue-or-break-statement:

  • continue-or-break \n
  • continue-or-break if expression \n

continue-or-break:

  • continue
  • break

return-statement:

  • return expression \n
  • return expression if expression \n

compound-statement:

  • \n? declaration-list? statement-list?

declaration-list:

  • declaration
  • declaration-list declaration

statement-list:

  • statement
  • statement-list statement

expression-statement:

  • expression \n

iteration-statement:

  • while expression : statement else-clause?
  • do : statement while expression \n? else-clause?
  • for expression? ; expression? ; expression? : statement else-clause?
  • for each declaration in expression : statement else-clause?

else-clause:

  • else statement

selection-statement:

  • if expression : statement else-clause?
  • falling? switch expression : statement

labeled-statement:

  • identifier : statement
  • case constant-expression : statement
  • default : statement

External Definitions

translation-unit:

  • external-declaration
  • translation-unit external-declaration

external-declartion: Allowed only at external (file) scope

  • function-definition
  • declaration

function-definition: Declarator here is the function declarator

  • declaration-specifiers? declarator declaration-list? compount-statement

I told you there wasn’t much left! Let me just quickly say, I’ve been working on this article for a few days so please forgive me if it isn’t that easy to follow. I will gladly take your suggestions for improving it. I especially would love if you point out errors I’ve made. The good news is, now that I’ve created the grammar for Sea, I can get to work writing the next article where we begin to program! As I program more and more of Sea, I can find out where I made mistakes in the grammar. In particular, I wasn’t too rigorous with the indentation or newline requirements. Anyways, this has been long enough so let me finish up here.

Photo by Ian Stauffer on Unsplash

Finishing Up

If you are creating your own programming language, starting with the grammar can be a daunting task. It is one of the hardest parts of creating a language; however, that’s the reason I suggest starting with it. Look at languages you know and search for their grammar. Here is Python’s official grammar for example. You aren’t creating the world’s first programming language; you are creating a variation based on all of the existing programming languages. So, use those languages! People have put in work over decades to optimize these languages, so start from there and make the modifications you want. For example, other than general Pythonifying of C’s syntax, I’ve added a few things for Sea:

  • For-each loops. These will allow for easy iteration through a list without using any more resources than if you had done it manually with a traditional for loop.
  • The falling keyword for switch statements. This modifier will make switch statements behave as they traditionally do; if you do not break, the execution will fall through the switch statement. I find that most of the time, fall-through behavior is not that useful. So by default, switch statements will behave as if every case has a break. However, you can use continue to fall through.
  • Conditional goto, break, continue, return. Often, you want to jump only if a condition is true. So, I’m providing a readable one-line solution to this. I personally don’t like the look of one-line if-statements in Python, so I’m creating this syntax. It won’t be confused with the conditional statement because this will never have an else clause while the conditional always will.

I’ll also point you at the else clause for loops from Python (the else statement is performed if you did not break from the loop) as well as the do-while loop from C. I would prefer that Sea have redundant notation if it means creating more readability in different situations. I’ve also cleaned up some of C’s strange underscore-heavy keywords, as well as split up multiword keywords into two keywords. This makes Sea read slightly more like a natural language and, thus, is optimal even if it means there are slightly fewer possible identifiers.

Look forward to the next article where I will begin to program Sea in Python and we can start to see how all of this grammar works. If you don’t fully understand the grammatical structure now, it’s alright; once we start writing code, it’ll become clearer. Check out my Programming Language List for the previous and following articles involved!

--

--

David MacDonald
CodeX
Writer for

Developer, Mathematician, and Introspective Thinker. Learn to become smart, but live to become wise.