Designing a programming language

Totally_Not_A_Haxxer
Bootcamp
Published in
13 min readJan 16, 2023

--

SkyLine Logo A Modern Programming Language

Introduction

We all have heard of a programming language whether or not we are into tech or not. However we never really stop to think about the back-end of those languages, but what about creating one? Most people will say they want to create a programming language but never actually find the time or skill or knowledge or even how to in terms of design. In today's article I will be going over basic logical programming language design which talks on syntax and how you can make you’re language meaningful taking after my most recent programming language SkyLine.

Understanding the basics

Before you design your own language it might be ideal to look at the layout of programming languages, understand their differences, their types as well as the ideas and the way they are structured. Below you will find a list of common concepts with and about programming languages that can get you familiar with how a programming language can be built.

  • Interpreted: Interpreted languages are programming languages that have a runtime. These languages will take an input file such as filename.pl in a perl case and will be ran through what is known as an interpreter. An interpreter will go through that entire file line by line and execute, parse and format each line of code as it gets to run.
  • Interpreter: An interpreter in computer science is a type of system within a programming languages backend that takes an input file and will read through that file. In an interpreter unlike a compiler, the code is directly executed every time it is ran rather than transferred to byte code or raw machine code which is what a compiler does. An interpreter is typically found in common languages like Lua, Ruby, Perl and python including other languages alike.
  • Compiler: A compiler is a type of program that takes an input file such as main.c or main.cpp and will translate that code into a lower level form of code which is almost always going to be assembly. This code unlike an interpreter does not run line by line as the compiler reads it but instead is run once the target executable or output file is ran such as exe or elf ( executable linux file ). Compilers are typically found in languages like C, C++, Fortran, Go, Rust and assembler as well as languages alike. These languages are known as compiled programming languages.
  • Satically typed: A statically typed programming language is a language where the data type of a given variable, return value, function argument or param etc is specified when the variable is specified. For example golang is a statically typed language because after every variable or before every variable you have to declare a data type like so.
var Data string
var Data2 int

func Run(input1 string, input2 interface{}, channel chan) (string, error) {

}

typically statically typed programming languages are much more robust with their error systems, this is because in languages like Go the error is caught during compile time. This means that when you try to compile the program or run it in a case like go, go will give you an error saying the data type was undefined or there was a missing IDENT ( Identifier ) before the variable was declared.

  • Dynamically typed: Unlike a statically typed programming language a Dynamically typed programming language is a language where the data type of a variable is determined based on its value. For example if you have x = 1 typically the language will say okay well X is not surrounded by quotes or ‘’ or unicode which is `` so this must mean this variable is another data type, then it will scan over the variable and pass it through an amount of checkers. In most languages there are functions known as verifiers which try to parse the data type and figure out what it is. This means unlike statically typed languages the data type does not need to be defined and again is rather assumed by the language’s parser. However this also means a negative impact for the error system and means that the code will be less robust as the errors will not be found during compile time and will only be caught at runtime.

Getting a design pattern

the coolest thing about building your own programming language is that it can purely be your design. I currently have 3 designs out for 3 programming languages. 2 which are currently out in testing and development. Your design is your choice and no person has control over that, however if you want your language to be used by other people and want it to be successful such as languages like python then it might be smart to choose a more user friendly syntax or design lay out rather than something like this

  init RPC templating
{
(
[
RPC -> RPCL_Count | RPC.BRICK => {{{
RPC:::Unit::SYMBOL | RPC.BRICK START
@main@
!{"main/Engine"}!
%{"fmt"}%
XQ : RPC::Prepare::LI:A =RPC_PRE.Values[i]=
RPC::ForLen i := 0; i < len(RPC_PRE.Values)| i++ {
XQ
}
RPC::Unit::SYMBOL__RPC.BRICK::END
}}}
]
)
}
end RPC templating

which comes from my secondary programming language called RPC ( Radical Processing Core ). You may want something that is your own but is also easy for people to understand unlike the code snippet above which is extremely confusing to someone who 1 did not develop the language and will most likely not be touched by anyone in existence XD even the developer. So how exactly do we get a good design pattern? Well in a sense when we want to start designing the programming language which for the sake of the article will be designed to be user friendly we want to lay out what type of language we want. The following questions are good questions to ask yourself when developing a programming language design or syntax.

  • 1): Is the language statically typed or dynamic?
  • 2): Is the language modern or more traditional? In other words is our language going to be similar to FORTRAN or more like Python?
  • 3): Is the language going to be user friendly?
  • 4): Is the language designed to be well rounded?
  • 5): Is the language designed for customization?
  • 6): Is the language compiled or interpreted

For the articles sake here is the language we will be designing. Before I go onto answer these questions I want to note this article is to guide you and give you a general “template” to writing a design for your programming language, you can always add much more features and design questions and patterns.

  • 1): The language will be dynamically typed
  • 2): The language we are going to be writing is going to be similar to the Crystal programming language but more dynamically typed and without OOP.
  • 3): Our language will want to be more user friendly to work with the modern world.
  • 4): The language design should be well rounded ir in other words flexible, this means our language will have optional sections, keywords etc.
  • 5): The language due to 4 being designed to be optional and rounded will be designed to allow the user to customization. In the future of the language macros and modification to allow your own keywords might be a good idea.
  • 6): Interpreted

Note: When creating a design for a programming language, it is important to make it as accurate as possible and not wild because you will learn that during the process of this you will most likely end up changing. Because if you have never developed a programming language before it may be hard to implement complex design patterns from my second language shown above RPC (Radical Processing Core).

Building a Pseudo code of the design.

When you get your idea down of what the language is supposed to be we want to start actually writing code for the language and getting an idea down for it. Obviously I will not go through explaining how to write a Parser, Lexer, AST, Symbols, Error system, Type system, Macro system, Compiler, Interpreter, VM etc for the programming language but we will rather be designing the languages code design.

Variables

Variables can be extremely difficult to create or extremely simple. Due to our language being dynamic and interpreted we want to make variables easy to declare such as the following.

varname = "data"
variable = 1
variable2 = 2.5
boolean = true
boolean2 = false
unicode = `data`
rune = `someunicode||||||`

But the thing with this is that it may seem like a good idea, but if we are making a modern programming language and we want to allow users to have customization maybe we should add an assignment keyword otherwise known as a keyword before declaring a variable name. For our language to be understandable and modern we can use a keyword such as let because let is well- easy to understand, its a keyword that lets a variable be a variable or have a value. In other cases or in the future keywords such as CONST ( constant ) or VAR ( variable ) or GLOBAL (GBL/global variable) and even LOC (local) will be added to your design but right now we just want a basic programming language design we can work with. The let keyword will work like so

let x = "data"
  • how are keywords implemented? Keywords in a programming language typically are around the names return , let , allow , var , const etc and after the keyword is typically a value which comes directly after it. We all know how keywords operate but how do they **work**. Well in a application of a programming language such as the interpreter or compiler a function is declared such as Peek(). It may be called other names however this is a basic name for it, basically a PEEK() takes the current token the lexer has picked up and will peek ahead to check if there is anything after it. so say the lexer has picked up a line of code and separated it like so
{let, x, =, "data"}
  • If the keyword LET is detected it checks the second token after it, if the second token is not a known method, keyword, data type or the keyword in our case X will be set as the current token then the parser will call to PEEK() again to check if there is a symbol such as = after it, and if it is it will then take anything after the equals symbol and do what it needs to whether that is assuming the data type or parsing more data.

Functions

Functions otherwise known as subroutines or methods in a programming language are one of the most important things. However the issue with it is that some functions are hard to implement or may be a bit wacky. In our case like our variable names the functions will have to be dissected. Here is the things we are going to dissect for our design

  • Function keyword
  • How will the function return data
  • How will the function define parameters or arguments
  • How will the function be defined
  • How will the function be called
  • How will the function separate multiple parameters

In our case we want our functions to be as customize-able as possible, which means the way the function is defined, argued, called, added, allowed etc will be created in a wacky way.

  • Function keywords: We are going to be like Go here and be a bit more user friendly with function names and keywords. So the keyword to define a function will be func or Func
  • Function Definitions : In our language for the article given we allow people to use let or opt out with using let we want to allow functions to have similar methods where you do not require let but you can use it if you want like so. The following design allows users to use both the Func keyword followed by the methods name or the let keyword followed by the method name in call to = with the Func keyword to define that method as a function.
Func MethodName() {}
# or we can do it like this
let MethodName = Func() {}

This design for a function base is much more nicer in many cases

  • Function parameters: We want users to be able to pick how they define functions, how they work with the arguments in a function etc. We can go for a more nicer design when we use function arguments. In our language a function argument or rather known as a parameter will be defined like so.
Func methodname(argument1, argument2, argument3) {}
# or we can seperate it using :
Func methodname(argument1 : argument2 : argument3) {}
# We can also run both of these methods with the let keyword
let Methodname = Func(argument, argument2 : argument3) {}

This is a much more flexible design as it gives users the choice to use : or , to separate their arguments.

  • Calling a function: Calling a function is not going to be that hard, we can allways use CALL as a keyword but given our language is not traditional such as fortran we will just call the function by the function or method name itself with its params like so.
Func method() {}

method()

# calling with params

method(x, y, : data, data2)
  • Function return: We want to give the option to return data within a function, given the current design of the functions this can be easy. R-Script a programming language built for data analyitcs and calculus has a very very interesting way of returning data within a method or outputting data. For example the following R script defines a function and returns the variable
center <- function(data, midpoint) {
new_data <- (data - mean(data)) + midpoint
return(new_data)
}

This is a standard return function, however if you wanted to you can tell R to output the value of this function which can be the following.

x <- center(199999, 10000)
x
# x will output the value of center

We want out functions to not only be able to output data by simply just placing the variable but we also want it to return the variable without using return where a variable can store it but also give the option to use a keyword for return. In our case here we can use the following code.

let x = Func(data, data2) {
data - data2
}

let newvar = x(20, 10)

newvar

# second option

let x = Func(data, data2) {
return data - data2
}

let newvar = x(20, 10)

newvar

This code will be able to not only use a return keyword but hold function values as a return type without a return keyword. See how much more nicer that can actually work within a language?

Keywords

Keywords are pretty important in a language, without them there would be no form of communication within a interpreter that a programmer would find user friendly. So here we have a list of keywords we can implement.

return => returns a keyword
Func => defines a method or function
let => defines a variable
var => defines another variable
local => defines a local variable
global => defines a global variable
__data => will define script data naturally
>= => greater than or equal to
== => boolean operator for equals
!= => boolean operator for does not equal
<= => less than or equal to
> => greater than
< => less than
- => subtract
+ => add
/ => divide
* => multiply
% => format identifier such as %s
@ => Arrays
&@ => hashes
&& => and and
if => conditional start
else => conditional end or in other words consequence
for => for loop

This list of keywords can be optional in some cases such as let, avr, local, global etc or in function cases return which is optional. Remember we want out language to be very customize-able without being a project to actually customize the way the language operates. Given the list of keywords and operators.

Conditionals

Conditionals are another well rounded thing for programming languages to use. Without them practically 90% of applications would not be able to exist because most applications have at least one if not more conditional section inside of them. If you are not aware a conditional statement is a true or false statement in other words this statement checks if a value is true, if it is true it will execute a brick of code, if it is false it will execute another brick of code. I could explain a million different ways as to how conditionals work and how logical gates work but it is best that we do not do that right now for the sake of length and time within the article. Like we said before we want our code to be very easy to read and use so we can work with conditionals like so.

# conditional simple 
if (data == data) {
# do whatever
} else {
# do whatever
}

# or we can remove () in conditionals

if data == data && data == large {
# do something
} else {
# do something
}

These conditionals are quite simple but also give the user a wide range of ways to really define arguments or param within a conditional statement.

EOD ( End Of Design )

We have reached a point where our design is not perfect but it is not horrible either, it reached the exact goal of this article which was to design the simplist programming language alive. It may not be able to do anything cool yet however it is a good start on what we wanted.

Summary

This article went over explaining the most basic terms of a programming language then dove into steps you can take to create a decent design pattern. I hope this article gave you a good idea as to how deep and large a design pattern can get. This article was quite short and it did not reach on or touch on everything I wanted to touch on however possibly this may be a future series!

~ Totally Not A Haxxer OUT!

End

Welp we have reached the end of this article! I know this was quite a short one but it was also one that did not need much depth I was rather just sharing my experiences. I hope this article helped give you some insite as to what it can be like to work on development teams both privately and publically. If you want to keep up with my content do not forget to help support me!

  • SkyLine
  • My github
  • My instagram

https://www.instagram.com/totally_not_a_haxxer/

  • Other links and URL’s

https://wwww.beacons.ai/totally_not_a_haxxer

--

--

Cyber Security Educator, Developer, Social media manager, Author, youth education, content creation, engineering, ui/ux, RE