Shreyas Malewar
GDSC GHRCE
Published in
5 min readAug 28, 2020

--

How is our code executed?

“Hello, World!” has an extraordinary place in our hearts. It is the first program anyone who has started learning a new programming language writes, and somehow magically it is the fastest program we ever execute with absolutely zero errors (btw If you are a newbie, do not get your hopes up. It is a dream which never comes true while writing an actual software). After a couple of years of programming, we realize that we learned programming in an abstraction oriented environment i.e., not knowing exactly what is going on under the hood of the software. Therefore in this article, we shall see “How is Our Code Executed?”

The One where it all began — Integrated Development Environment

The Integrated Development Environment often referred to as IDE, is a piece of software used for writing software. I know it sounds like this

The only difference is we use IDE to create something. IDE is a very powerful tool to write and debug programs. They are usually language-specific, but we can sometimes interchange the language and IDE. We generally use IntelliJ IDEA and Eclipse for Java, PyCharm and Spyder for Python, Visual Studio & Turbo for C. There is a popular misconception that an “IDE is necessary for Software Development” I would correct this premise to “IDE is important for Software Development” as one can work with the most straightforward text editor as long as a compiler for that language exists in our system. We will discuss compilers in the upcoming sections.

The One where Magic happens — World of Compilers

A compiler has a wide range of responsibilities, and we will look into the smallest details of the same.

Preprocessor

Right after pressing the “Run” button in our IDE entire High-Level code is passed through a preprocessor, the thing about preprocessors (considering C and C like languages) is that it loves “#.” “#” is a preprocessor directive. Whenever it reads a “#” followed by “include” it immediately replaces the header file with the library mentioned in an include statement; this process is known as file inclusion. And when it reads an “#” followed by “define.” it performs macro expansion.

#include<iostream.h> -> File Inclusion#define MAX = 10     -> Macro Expansion

The preprocessor does nothing but invokes the necessary files for compilation and converts High-Level code to pure High-Level code. The compiler further processes the code.

Compiler

A compiler is broken down into six major components.

1. Tokenizer (Lexical Analyzer)

2. Parser (Syntax Analyzer)

3. Semantic Analyser

4. Intermediate Code Generator

5. Code Optimizer

6. Target Code Generator

Since every working party needs a manager to make sure things are done correctly, the Symbol Table Manager makes sure that no component skips work. (Alas! They do not have work from privileges).

  1. Tokenizer — Pure high-level code is passed through tokenizer which converts the program into a series of tokens. Tokens are identified as keywords, constant, special symbol, operator, string, or identifier. In our “Hello World” program, tokenizer would dissect the code in the following way

Keyword — int, main, printf, return

Operator - (), {}

Special Symbol - ;

String - Hello, World!

This process is also known as Lexical Analysis since it deals with grammar and the so-called vocabulary of the language.

2. Parser — Stream of tokens passed through parser; this step is known as syntax analysis. It checks whether the token stream has the correct syntax, and it is done by creating a parse tree of a program. For example, it will check whether curly brackets or braces exist in pairs or a semicolon exists at the end of the line or not. If the stream of tokens fails lexical parsing, it responds with a suitable error.

3. Semantic Analysis — “My table will drive a car to Delhi tomorrow” is grammatically perfect but makes no sense. The same thing happens in semantic analysis.

int number = “letter”;

The above code is correct to initialize an integer as far as the syntax is concerned, but makes no sense. As we cannot store a string in an integer. Common semantic errors are Type Mismatch, Multiple Declaration, Undeclared Variable, and Reserved Identifier Misuse.

4. Intermediate Code Generator — In this step we are almost halfway through the compilation process. Intermediate Code is neither a High-Level code nor a machine code. It lies somewhere in between hence the name intermediate. It is somewhat like this.

The compiler can generate Intermediate code with the help of Abstract Syntax Tree, Polish Notation, or Three Address Code.

  • Abstract Syntax Tree
a * (b + c)
  • Polish Notation
Expression - a * (b + c)Step 1 -> a * b c +Step 2 -> a b c + *
  • Three Address Code
Expression - a * (b + c)Step 1 -> t1 = b + cStep 2 -> t2 = a * t1

Note that intermediate code is machine-independent, and it eliminates the need for full compilers.

5. Code Optimizer — It helps in improving the code performance by removing unused variables, eliminating code that has no part in code execution. There are two types of code optimization Machine Dependent Optimization and Machine Independent Optimization.

6. Target Code Generator — As the name suggests, “Target Code,” this generator is responsible for generating previously optimized code in assembly language.

Assembler

The assembler converts Assembly Language code into “relocatable” machine code, i.e., 0s and 1s. *Note the relocatable in inverted commas.

Loader

If your code reaches the loader in your system and you’ve reached so far in this article, you deserve a big congratulations. The loader is responsible for inserting the machine code generated from the assembler into the Random Access Memory (RAM) wherein it will be executed.

Linker

Machine code can be stored physically anywhere in the RAM since it is relocatable (remember it from assembler). If our program contains operations on the memory machine code could traverse through the entire RAM, and the compiler would not know where the machine code ended up after the execution. Therefore the compiler brings in the linker, which is a placeholder, and helps the compiler to store the address of the machine code. Linking is the last step of program compilation.

--

--

Shreyas Malewar
GDSC GHRCE

I love to code, but also an avid reader of Military history, Economics and geopolitics.