Creating an Interpreter for Monty Bytecodes Using C Programming Language

Robert Amoah
10 min readJun 17, 2023

--

PART ONE

I am trying to solve an ALX project “0x19. C — Stacks, Queues — LIFO, FIFO” which any SE peer will come across in month 3 of the program. My way of solving this project will be conversational as well as taking you through how I thought of the project and every bit of code I used, and sometimes, why I used them. I won’t be explaining all the concepts, so I request that you revise a bit of the following before you continue. Call them prerequisites. They are:

1. Pointers

2. Structs

3. Linked lists — singly and doubly

4. Stack and queues

For number 4, above, it would be a great deal if you already understood them, but I will take a little time to explain. So will I touch on what the interpreter we are building looks like.

Starting with Monty 0.98, it is a scripting language which is compiled into bytecodes just like python. If you are a peer in ALX’s SE program and in your fourth month, or have gone beyond that, you would have come across bytecodes. These are basically instructions (opcodes) with or without arguments that tell the machine which operations are to be performed. A few of such instructions in relation to Monty 0.98 are push, pall, stack, queue, pop, etc. Some of these instructions do have arguments. An example being: “push 5”. Which means that opcode “push” has an argument which is “5”. I will try to explain each opcode we will implement as and when we implement them. So, don’t be bugged by that for now. Keep reading even if you do not know what the individual opcodes mean, at the moment.

Stacks. What are they? Imaging the CD rack. Well, you cannot imagine that if you are below a certain age, right? 😂. So, let’s look at the diagram below:

A diagram showing how CDs are added to and removed from a CD rack
description of stack

Consider the numbered lines as CDs. Looking at the diagram above, you would see how the CDs get onto the rack one by one. CD, I, gets onto the rack first, followed by II, then III, in that order till V gets put on. This describes the insertion of a node to the end of a linked list (whether singly or doubly). In our codes, which you will soon see, I will be inserting it to the beginning of the linked list. When it comes to the removal of a node, the last to be inserted is the first to be removed. This is seen in diagram C. CD V (the last to enter) is removed first. Thus, last CD that goes in during insertion is the first CD going out when it comes to removal/printing (LIFO). LIFO can also be turned to give FILO, which means the first CD that goes in is the last CD to go out.

For Queues, we have all being in one, one way or the other. Imagine visiting a Pharmacy shop and meeting other customers by the counter. You don’t expect to be served immediately by the Pharmacist. Do you? Rather, you expect the Pharmacist to serve the first customer, then he/she leaves; then the second customer, then he/she leaves; till it gets to your turn (the last customer), then you leave. What this shows us is that the first customer to go into the shop is the first to go out. This is FIFO (first in first out). In a linked list, the first node inserted will be the first node to be removed/printed.

We are going to do this together. Apart from this, I will be taking us through some finished code, but, importantly, through the process of getting to that final code. This involves coding and a lot of refactoring to make the code a bit readable and easier to understand. Simple code. In this part of the series of articles, I will continue with (actually start coding) the header files, and detail the base code structure upon which the implementation of the opcodes will be based.

First thing first. We will be compiling our codes with “gcc -Wall -Werror -Wextra -pedantic -std=c89 *.c -o monty”. Our header file will be “monty.h”.

The starting point of the monty.h header file
monty.h

Per the ALX project, we are to use some structs. I am just going to copy and paste the given structs into the header file. Below, are the given structs.

After pasting the required structs into monty.h
monty.h

Reading the structs will let you know which one to use for opcodes and which one to use for the stacks or queues. To help you out, the arguments of opcodes will be integers and there should be a function that handles the implementation of that opcode. Hence, instruction_s struct will represent an opcode node. Its members being a pointer to “char” which means our opcode is a string. The “f” member is basically a pointer to a function that will handle the implementation of that opcode. When it comes to the stack/queue, we will be using the stack_s struct. “n” will be the integer argument associated with an opcode. The “prev” or “next” members of that struct will be for linking a node to other nodes. You know your doubly linked lists, right? Don’t get scared.

The following are the requirements for the program:

  • Usage: monty file
  • where file is the path to the file containing Monty byte code
  • If the user does not give any file or more than one argument to your program, print the error message USAGE: monty file, followed by a new line, and exit with the status EXIT_FAILURE
  • If, for any reason, it’s not possible to open the file, print the error message Error: Can’t open file <file>, followed by a new line, and exit with the status EXIT_FAILURE
  • where <file> is the name of the file
  • If the file contains an invalid instruction, print the error message L<line_number>: unknown instruction <opcode>, followed by a new line, and exit with the status EXIT_FAILURE
  • where is the line number where the instruction appears.
  • Line numbers always start at 1
  • The monty program runs the bytecodes line by line and stop if either:
  • it executed properly every line of the file
  • it finds an error in the file
  • an error occured
  • If you can’t malloc anymore, print the error message Error: malloc failed, followed by a new line, and exit with status EXIT_FAILURE.
  • You have to use malloc and free and are not allowed to use any other function from man malloc (realloc, calloc, …)

Now let’s talk about how I think about the project, having read the requirements. The compiled program will be run with a file (or path to a file) as an argument. The program should read and execute instructions from the file, line by line. We have been given some requirements to consider in our coding. We are going to create a system that takes an argument (path to a file) then opens the file. Once this is done, we will read each line in the file. For each line read, we will try to get the instruction, if one exists in that line. How will we do this? Think about it for a second. How will we get an instruction from a line? How will we even determine if it’s a valid instruction? Let me help make it easier. The line we read from the file will be a string (char *). We will break this string into chunks/tokens which are strings by removing the spaces and newline characters while storing those tokens into an array (char **). The first string in the array can be compared with the valid opcodes we expect to implement in the program. Once we find a valid match, we can then call the function associated with that opcode. Its simple right? Well let’s see how it goes.

Let us start with our main.c file. We will first deal with the issue of number of arguments.

Initial handling of the of the number of arguments
main.c

For the above code, I checked if the argc is not equal to 2, which means there was either no file path provided, or more than one argument was provided. And I’m using the “dprintf” function to print the required string to “stderr”, which has 2 as its file descriptor. I would like to show, for once what I usually do. I try to compile my code after implementing something to check and correct errors. Below is an image showing you what I missed in the code I just wrote. I forgot to include some standard libraries. For me, I like to include my standard libraries in the header files I use. So, I will do just that. Going forward, you will only see changes I make to certain files, as well as the new codes I write, without any errors. You can assume, and you will be right, that I will have a few errors on my way to the final code. Also, I need to mention that based on the command we are to use to compile our code, the prototypes of some standard library functions will have to be added to our header file, examples being “dprintf” and “getline”. You may have some others depending on which functions we chose to use.

Code compilation errors
standard output

You can test the current code after compiling successfully by checking if commands like: “./monty” and “./monty <file1> <file2>” prints the required string.

You should note a few things about how I would like to proceed. I like to create functions that have readable names. This will make my main function easy to read and understand. Also, certain variables in the program will have to be accessed from various functions. There is going to be possible printing of errors and exiting from the program happening in different parts of the program. With these things in mind, I am inclined to the creation of a global variable which will point to a struct with its members being these values that I would want to access in various functions. One advantage of this is that freeing up allocated memory for any such variable becomes easier because all of them will be freed in one place. I will create the struct and a pointer to it and make it global, later as we go ahead.

Based on what I just said, I will replace the code we just wrote in the “main” function with a “validate_arguments” function which will be in its own file as well. Now we will have main.c and validate_arguments.c files which look like:

refactoring main.c
main.c
ensuring that the number of arguments are two
validate_arguments.c

The next thing I would want to tackle is to get the stream from the provided file. The stream is going to be a pointer to the file (FILE *). I would like to create the global variable which will be a pointer to the struct, as I mentioned earlier. I am starting with the stream (FILE *) and line (char *) as members of the struct. The stream will hold the stream of, or connection to, the opened file. The getline function will read each line into the line member of that struct.

Initial members of the struct arg_s
monty.h
initialize arguments variable as well as member variables
initialize_arguments.c
error handle for failed memory allocation
malloc_failed.c
get stream to opened file with error handling
get_stream.c
get stream, read and print lines in the file
main.c

Having created the arg_s struct, I create a global arguments variable by using “extern arg_t *argument;“ statement in the header file. Outside of the main function, I initialize the arguments pointer to NULL.

Examining the main function, I initialize the arguments pointer with the initialize_arguments function. In that function, I allocate memory to the arguments pointer and set the stream and line members to NULL. When the allocation fails, I handle it with the malloc_failed function.

I then set the stream member to point to the stream of the file using the get_stream function. In that function, I try to open the file for reading using the open standard library function. When opening fails, I handle it with the getting_stream_failed function. Once, the file is opened, I try to get the stream to the file using the fopen standard library function. If we unsuccessfully get the stream, the stream variable will be NULL. Hence, I close the file with the fd (file descriptor of the opened file) and handle the error using the same getting_stream_failed function if the stream from the file is NULL.

Since we will be reading each line in the file until there is nothing to read, I use the “while” loop together with the getline standard library function. This assigns the line read from the stream to the arguments’ member line. I am currently just printing the line to the standard output.

Compiling the code to create a “monty” executable and running the program with the command “./monty bytecodes/000.m”, where “bytecodes” is a directory which will contain some Monty bytecodes for testing and “000.m” being a Monty bytecode file. We correctly get the output below:

output from printing lines in a Monty file
standard output

An important thing to note is that, you will have to update your header file with the necessary prototypes as we code. I believe you can deal with that so I am not showing that here.

Ending the first part of this series, let me now fill in the “main” function with some functions that detail how I would want to handle the remaining parts of the program. It will give you something to think about before checking Part Two of this series.

template of the final main.c
main.c

To make it easier to follow through the series, this is a link to the part:

https://medium.com/@mr_robertamoah/creating-an-interpreter-for-monty-bytecodes-using-c-programming-language-d5757cb08358

--

--

Robert Amoah

Full-stack web developer #laravel #nestjs #vue #react Mobile Application Development #reactnative DevOps and Cloud enthusiast #linux #bash Love Data Science