Implementing Easy-Script: A Mini Scripting Language with a Simple Go Interpreter

Anik
10 min readJul 23, 2023

--

In the programming world, creating our own scripting language can be an exciting journey, offering deep insights into how programming languages work under the hood. In this article, we will build our own lightweight scripting language called easy-script, designed with simplicity in mind.

We’ll focus on basic operations like arithmetic calculations and console logging. Script files will have a .es extension, short for easy-script.

Let’s take a look at an example, that demonstrates the syntax and functionality of our language:

console.log("Arithmetic Operations:");
console.log("PLUS: 10 + 20 = ", 10 + 20);
console.log("MINUS: 20 - 10 = ", 20 - 10);
console.log("MULTIPLY: 10 * 20 = ", 10 * 20);
console.log("DIVIDE: 20 / 10 = ", 20 / 10);
console.log("MODULO: 25 % 10 = ", 25 % 10);
console.log("POWER: 10 ^ 2 = ", 10 ^ 2);

In the above script, we have a series of console.log statements. Each statement logs a string literal and the result of an arithmetic operation. As we can see, the syntax is similar to that of JavaScript’s, making it easy to write and understand.

The operations include addition (+), subtraction (-), multiplication (*), division (/), modulus (%), and exponentiation (^). The results of these operations are logged to the console along with some messages.

Fun Fact: Node.js and many web browsers like Google Chrome, Firefox, and Safari use JavaScript engines — V8, SpiderMonkey, and JavaScriptCore respectively — that are mostly written in C++. These engines translate JavaScript source code directly into machine code, greatly improving execution speed. So, while writing in JavaScript, you’re actually leveraging the performance of C++ powered engines!

To bring this language to life, we’ll design and implement an interpreter in Go. The interpreter will read the easy-script file, parse the content into an Abstract Syntax Tree (AST), and then evaluate the AST to execute the script.

In the following sections, we’ll delve into each part of this process: lexing, parsing, AST formation, and execution. By the end, we’ll have a functional Go interpreter for our easy-script language. Let’s get started!

Now, let’s walk through the creation of an interpreter using Go for our newly created scripting language, easy-script. The interpreter involves a series of processes that transform a high-level script into executable commands.

Here’s an overview of the entire process:

  • Read the Script: The interpreter begins by reading the source code of the script. The source code is contained in an [.es] file. The file contains a series of statements, which are primarily console.log statements with arguments that can be string literals, integer literals, or arithmetic expressions.
  • Lexical Analysis (Tokenizing): The interpreter conducts lexical analysis on the source code, converting characters into tokens. Tokens are the script’s building blocks, representing keywords (console, log), string literals, integer literals, or operators (+, -, *, /, %, ^).
  • Parsing (Syntax Analysis): In this step, the parser generates an Abstract Syntax Tree (AST) from the tokens produced by the lexer. The AST represents the script as a tree, with each node corresponding to a console.log statement, a string or integer literal, or an arithmetic operation. The structure of the tree reflects the order and hierarchy of operations in the script.
  • Evaluating (Executing): Once the AST is constructed, the interpreter then traverses the tree, evaluating each node based on its type. Each node defines its own Execute() method that describes how to evaluate the node. For example, an addition node adds the values of its left and right child nodes, and a console.log node concatenates the results of its argument nodes and logs the resulting string to the console.
  • Outputting the Result: The console.log statements in the script generate output as the interpreter evaluates the AST. For each such statement, the interpreter logs the results of evaluating its arguments to the console. This provides the output of the script.

By executing these operations, the interpreter would be able to transform the high-level script into a set of low-level operations that the computer can understand and execute.

To create the interpreter, we can break the code into the following parts:

Part 1: Importing Libraries and Defining Constants

package main

import (
"fmt"
"math"
"os"
"strconv"
"strings"
)

// Defines different types of tokens
const (
TokenConsole = "CONSOLE"
TokenLog = "LOG"
TokenString = "STRING"
TokenInt = "INT"
TokenPlus = "PLUS"
TokenMinus = "MINUS"
TokenMultiply = "MULTIPLY"
TokenDivide = "DIVIDE"
TokenModulo = "MODULO"
TokenPower = "POWER"
)

This is the start of the program. It specifies the main package and imports the necessary libraries for us to work with. It also defines constants for the different token types that the lexer will generate. These include tokens for the console.log statement, string and integer literals, and the various arithmetic operators.

Part 2: Token Struct

// Token struct
type Token struct {
Type string
Literal string
}

Here, the Token struct is defined to hold the details of a token. Each token has a Type (one of the constants defined earlier) and a Literal (the actual text that the token represents).

Part 3: Node Interface and Node Structs

// Node interface
type Node interface {
Execute() string
}

// Node type for console.log statements
type ConsoleLogNode struct {
Arguments []Node
}

// Execute for ConsoleLogNode
func (n *ConsoleLogNode) Execute() string {
args := make([]string, len(n.Arguments))
for i, arg := range n.Arguments {
args[i] = arg.Execute()
}
return strings.Join(args, " ")
}

// Node type for string literals
type StringNode struct {
Value string
}

// Execute for StringNode
func (n *StringNode) Execute() string {
return n.Value
}

// Node type for integer literals
type IntNode struct {
Value string
}

// Execute for IntNode
func (n *IntNode) Execute() string {
return n.Value
}

This part of the code defines the Node interface and several structs that implement this interface. Each struct represents a different kind of node in the AST. The Node interface requires an Execute() method, which is how each node type defines its execution behaviour.

The ConsoleLogNode struct represents a console.log statement. Its Arguments field is a slice of Node objects, each representing an argument to the console.log statement. The Execute() method for ConsoleLogNode executes all the argument nodes and joins their results with spaces.

The StringNode and IntNode structs represent string and integer literals, respectively. Their Execute() methods simply return the value of the node.

Part 4: More Node Structs to support Arithmetic Operations

// Node type for addition operation
type PlusNode struct {
Left Node
Right Node
}

// Execute for PlusNode
func (n *PlusNode) Execute() string {
left, _ := strconv.Atoi(n.Left.Execute())
right, _ := strconv.Atoi(n.Right.Execute())
return strconv.Itoa(left + right)
}

// Node type for subtraction operation
type MinusNode struct {
Left Node
Right Node
}

// Execute for MinusNode
func (n *MinusNode) Execute() string {
left, _ := strconv.Atoi(n.Left.Execute())
right, _ := strconv.Atoi(n.Right.Execute())
return strconv.Itoa(left - right)
}

// Node type for multiplication operation
type MultiplyNode struct {
Left Node
Right Node
}

// Execute for MultiplyNode
func (n *MultiplyNode) Execute() string {
left, _ := strconv.Atoi(n.Left.Execute())
right, _ := strconv.Atoi(n.Right.Execute())
return strconv.Itoa(left * right)
}

// Node type for division operation
type DivideNode struct {
Left Node
Right Node
}

// Execute for DivideNode
func (n *DivideNode) Execute() string {
left, _ := strconv.Atoi(n.Left.Execute())
right, _ := strconv.Atoi(n.Right.Execute())
return strconv.Itoa(left / right)
}

// Node type for modulo operation
type ModuloNode struct {
Left Node
Right Node
}

// Execute for ModuloNode
func (n *ModuloNode) Execute() string {
left, _ := strconv.Atoi(n.Left.Execute())
right, _ := strconv.Atoi(n.Right.Execute())
return strconv.Itoa(left % right)
}

// Node type for power operation
type PowerNode struct {
Left Node
Right Node
}

// Execute for PowerNode
func (n *PowerNode) Execute() string {
left, _ := strconv.Atoi(n.Left.Execute())
right, _ := strconv.Atoi(n.Right.Execute())
result := math.Pow(float64(left), float64(right))
return strconv.Itoa(int(result))
}

The PlusNode struct represents an addition operation. It has Left and Right fields that are Node objects representing the operands of the addition. The Execute() method for PlusNode executes the left and right nodes, converts their results to integers, adds them, and returns the result.

Similarly there are node types for the other arithmetic operations: subtraction, multiplication, division, modulo, and exponentiation. Each of these node types also has Left and Right fields representing the operands and an Execute() method that performs the operation on the results of executing the left and right nodes.

Part 5: Lex Function

// Lex function to convert the input string into tokens
func Lex(input string) []Token {
tokens := []Token{}
statements := strings.Split(input, ";")

for _, stmt := range statements {
stmt = strings.TrimSpace(stmt)
if stmt == "" {
continue
}

startIndex := strings.Index(stmt, "(")
endIndex := strings.LastIndex(stmt, ")")

consoleLog := strings.FieldsFunc(stmt[:startIndex], func(r rune) bool {
return r == ' ' || r == '.'
})
arguments := strings.Split(stmt[startIndex+1:endIndex], ",")

for _, word := range consoleLog {
if word == "console" {
tokens = append(tokens, Token{Type: TokenConsole, Literal: word})
} else if word == "log" {
tokens = append(tokens, Token{Type: TokenLog, Literal: word})
}
}

for _, arg := range arguments {
arg = strings.TrimSpace(arg)
if strings.HasPrefix(arg, "\"") && strings.HasSuffix(arg, "\"") {
tokens = append(tokens, Token{Type: TokenString, Literal: arg[1 : len(arg)-1]})
} else if strings.ContainsAny(arg, "+-*%/^") {
operatorIndex := strings.IndexAny(arg, "+-*%/^")
num1 := strings.TrimSpace(arg[:operatorIndex])
operator := strings.TrimSpace(arg[operatorIndex : operatorIndex+1])
num2 := strings.TrimSpace(arg[operatorIndex+1:])
tokens = append(tokens, Token{Type: TokenInt, Literal: num1})
switch operator {
case "+":
tokens = append(tokens, Token{Type: TokenPlus, Literal: operator})
case "-":
tokens = append(tokens, Token{Type: TokenMinus, Literal: operator})
case "*":
tokens = append(tokens, Token{Type: TokenMultiply, Literal: operator})
case "/":
tokens = append(tokens, Token{Type: TokenDivide, Literal: operator})
case "%":
tokens = append(tokens, Token{Type: TokenModulo, Literal: operator})
case "^":
tokens = append(tokens, Token{Type: TokenPower, Literal: operator})
}
tokens = append(tokens, Token{Type: TokenInt, Literal: num2})
} else {
tokens = append(tokens, Token{Type: TokenInt, Literal: arg})
}
}
}

return tokens
}

The Lex() function is the lexer of the interpreter. It takes a string of input (the source code) and returns a slice of Token objects representing the tokens in the source code. It uses standard string processing functions to scan the input and generate the tokens based on the token type.

Part 6: Parse Function

// Parse function to convert the tokens into AST nodes
func Parse(tokens []Token) []Node {
nodes := []Node{}

i := 0
for i < len(tokens) {
if tokens[i].Type == TokenConsole && tokens[i+1].Type == TokenLog {
i += 2

args := []Node{}
for i < len(tokens) && tokens[i].Type != TokenConsole {
if tokens[i].Type == TokenString {
args = append(args, &StringNode{Value: tokens[i].Literal})
} else if tokens[i].Type == TokenInt {
if i+2 < len(tokens) && tokens[i+2].Type == TokenInt {
switch tokens[i+1].Type {
case TokenPlus:
args = append(args, &PlusNode{Left: &IntNode{Value: tokens[i].Literal}, Right: &IntNode{Value: tokens[i+2].Literal}})
case TokenMinus:
args = append(args, &MinusNode{Left: &IntNode{Value: tokens[i].Literal}, Right: &IntNode{Value: tokens[i+2].Literal}})
case TokenMultiply:
args = append(args, &MultiplyNode{Left: &IntNode{Value: tokens[i].Literal}, Right: &IntNode{Value: tokens[i+2].Literal}})
case TokenDivide:
args = append(args, &DivideNode{Left: &IntNode{Value: tokens[i].Literal}, Right: &IntNode{Value: tokens[i+2].Literal}})
case TokenModulo:
args = append(args, &ModuloNode{Left: &IntNode{Value: tokens[i].Literal}, Right: &IntNode{Value: tokens[i+2].Literal}})
case TokenPower:
args = append(args, &PowerNode{Left: &IntNode{Value: tokens[i].Literal}, Right: &IntNode{Value: tokens[i+2].Literal}})
}
i += 2
} else {
args = append(args, &IntNode{Value: tokens[i].Literal})
}
}
i++
}

nodes = append(nodes, &ConsoleLogNode{Arguments: args})
} else {
panic("Invalid syntax")
}
}

return nodes
}

The Parse() function is the parser of the interpreter. It takes a slice of tokens and returns a slice of Node objects representing the AST for the source code. It loops through the tokens and constructs the corresponding node for each one, connecting the nodes as required by the syntax of the language.

Step 7: Eval Function

// Eval function to take a slice of nodes (AST) and evaluate them
func Eval(nodes []Node) {
for _, node := range nodes {
fmt.Println(node.Execute())
}
}

The Eval() function is the evaluator of the interpreter. It takes a slice of Node objects (the AST) and evaluates each node by calling its Execute() method.

Part 8: Main Function

// Main function to read the content of a .es file and pass it to the lexer, parser, and finally to the evaluator
func main() {
if len(os.Args) < 2 {
fmt.Println("Please provide a file to execute")
os.Exit(1)
}

fileName := os.Args[1]
if !strings.HasSuffix(fileName, ".es") {
fmt.Println("Unsupported file type. Please provide a .es file to execute")
os.Exit(1)
}

data, err := os.ReadFile(fileName)
if err != nil {
panic(err)
}

tokens := Lex(string(data))
fmt.Println("Tokens:")
for _, token := range tokens {
fmt.Printf("Type: %s, Literal: %s\n", token.Type, token.Literal)
}

ast := Parse(tokens)
fmt.Println("\nAbstract Syntax Tree:")
for _, node := range ast {
fmt.Printf("%T: %s\n", node, node.Execute())
}

fmt.Println("\nOutput:")
Eval(ast)
}

The main() function is the entry point of our interpreter. It begins by checking the command line arguments to ensure a file path has been provided and validates that the provided file has the correct .es extension. If the file extension is valid, it proceeds to read the file's content. The content is then passed to the Lex() function to generate a list of tokens. The token list is then passed to the Parse() function, which generates the Abstract Syntax Tree (AST). Finally, the AST is passed to the Eval() function, which evaluates the tree and executes the script.

While this provides a simplified overview of creating a scripting language and its interpreter, it’s important to understand that building a full-fledged language, compiler, or interpreter involves many more steps and complexities.

In a real-world scenario, you would need to consider aspects such as error handling, optimization for performance, support for more data types and control structures, memory management, garbage collection, and much more. These elements are crucial for ensuring that the language is robust, efficient, and user-friendly.

Creating a language also involves designing a syntax that is both expressive and easy to understand, implementing standard libraries that provide useful functionality, and potentially even building tools like debuggers and IDEs to support the development process.

Moreover, the process of creating a compiler or interpreter can be quite intricate. It involves not only parsing the source code and generating an abstract syntax tree (AST), but also performing semantic analysis, generating intermediate code, optimizing this code, and finally generating the machine code that can be executed by a computer.

This article is just the tip of the iceberg when it comes to language design and compiler/interpreter construction.

Note: The full source code can be found in this GitHub repository: https://github.com/anik-ghosh-au7/easy-script.git

--

--