Creating an Interpreter for Monty Bytecodes Using C Programming Language

Robert Amoah
6 min readJun 17, 2023

--

PART TWO

If you missed the PART ONE, then do well to read https://medium.com/@mr_robertamoah/creating-an-interpreter-for-monty-bytecodes-using-c-programming-language-3c495ba24eaa

We ended up the Part One of this series with some unimplemented functions. Remember I mentioned that the function names and file names will be the same. As a remainder, lets look at our main function again.

starting point
main.c

Let us start with tracking the line number tracking problem. To explain this, each time we read a line successfully, we will increase the line_number member of our global pointer by one. To make this work, I will add the “unsigned int line_number” to our arg_s struct in the header file. I will also initialize the line_number to 0 in the initialize_arguments function. That’s easy right? But why am I adding this to the struct? The answer is simple, I will need the line number in different places in the program. For error messages, and for other things. There will be struct arg_s member variables such as tokens, n_tokens, and instruction as well. The tokens member will be used to store the words in the line read from the file; n_tokens will be used to allocate memory for the tokens array; and instruction will be a pointer to the instruction_s struct and would contain a valid opcode from a line read, as well as a pointer f that points to a function that will execute that opcode. This is how our arg_s struct and initialize_arguments look like at the moment.

updated arg_s struct
monty.h
updated initialize_arguments function
initialize_arguments.c

Now let us look at how I handled the breaking up of the line into words. Mostly called tokenization.

getting tokens from the line read by getline function
tokenize_line.c

I first create a copy of the line. This copy will be used for the tokenization. You certainly have to allocate memory to this variable (linecpy) before copying line into it. The n_tokens is set to 0 and then I check the number of words (tokens) in the line with the first while loop. While there is a word, I increment n_tokens. With n_tokens, I allocate memory to the tokens array. I used n_tokens member variable plus one because I want to set the last item in the array to NULL. That will make looping through the array later on, easier to accomplish without accessing memory that was not allocated. I then copy line into linecpy again for another tokenization. Let me give you a task, if you do not already know how the strtok function works. Read on strtok by using “man strtok”. The tokens member will hold a number of words (char *) plus a NULL. So, for each word in the line, I allocate memory to tokens. You should always remember to cater for the null terminating character (\n). I then copy the word/token into the tokens array, get the next token and then increment the “i” counter that I used to allocate memory to tokens array. When there are no more words to allocate, I allocate NULL to the next space in the tokens array and then, I free up the memory we allocated for linecpy. Note that our delimiter for tokenizing the line was “ \n”. This means we are trying to remove every “ “ or “\n” in-between characters on each line.

We now have the words of a line read, in our tokens member of arguments pointer. Looking at the main function, we have to implement the get_instruction function.

get_instruction.c

I first of all create an array of all valid instructions. Per the instruction_s struct, each node will have an opcode member and a function pointer f that will execute that opcode. So, I populate the instructions array will all the opcodes and their respective, but currently empty, functions. The next thing I do is to check if there is a token (that is if the line is not empty). If there is nothing on a line, I just leave without doing anything further. In the case where there are words/tokens, I use a for loop to iterate over the instructions. While looping, I get each instruction_s struct and compare the opcode member to the first token in our arguments tokens. Why the first token? It’s the first token because per Monty, the first actual word, should be an opcode, if not a comment. We will get to comments later. Once we find a match in the instructions array, I try to set the instruction member of our arguments pointer because we will need it later on in our program. In the case no valid opcode is found, we will exit our program. I do this with the invalid_instruction function.

error handling when no valid instruction_s struct is found
get_instruction.c

What did I do there? After printing the required string required to stderr, which needed the line number as well as the invalid opcode, I close the stream, free up memory allocated to the tokens that were used to populate the tokens member, free arguments (I will handle the free_arguments function a bit later) and exit the program. The following are the contents of the close_stream.c and free_tokens.c files.

free up memory allocated to tokens member of arguments variable
free_tokens.c
close the stream of the opened file
close_stream.c

We loop through the tokens and free up all the memory allocated so far us it is not NULL. Remember we set the last token to NULL. This helps us to loop. After freeing these memories, we finally free up memory allocated to the tokens member and set it to NULL. For close_stream function, I close the file using fclose function and then set argumentsstream to NULL.

The next thing, and last, we will do in this part is to implement the run_instruction function. This is somewhat simple to understand if you have followed what we have been doing so far. So far, we should have the instruction member of the arguments pointer having a valid opcode and an associating f function pointer that will execute that the opcode instruction. We will be using this instruction member in our run_instruction. See what I did here.

call the function associated with the opcode from the instruction_s struct
run_instruction.c

Let’s examine the f member of the instruction. “void (*f)(stack_t **stack, unsigned int line_number)” shows that f points to a function which should have void as the return type and arguments which are a pointer to stack_t pointer and unsigned int, which is the line number from which the opcode was read. I create a pointer to stack_t and set it to NULL. If there are no tokens, that means there is nothing to do so I return from the function. Else, I call the associating function f with the address of stack_t pointer and line_number member of argments variable. The last thing you should note is that all associating functions should have the same prototype. For now, all such functions are empty and I have their prototypes in the header file. Let’s look at them.

prototypes of opcode implementing functions
monty.h

Get ready for Part Three where we will be implementing the functions associated with the opcodes. Let’s go 💪.

This is the link to the next part of the series: https://medium.com/@mr_robertamoah/creating-an-interpreter-for-monty-bytecodes-using-c-programming-language-47c9bab6c4be

--

--

Robert Amoah

Full-stack web developer #laravel #nestjs #vue #react Mobile Application Development #reactnative DevOps and Cloud enthusiast #linux #bash Love Data Science