From C to machine code — demystifying CPU mechanics

8 min readDec 5, 2019

Have you ever wondered how your computer interprets human written (C-) code, compiles it to machine code and executes the operation you just programmed? For me those internal mechanics were a “miracle” for a long time — you hit compile, wait several seconds and voilà, your application runs!

Code in interpreted languages like Python or JavaScript, can be executed immediately, without any intermediate compilation, almost on any arbitrary operating system of your choice.

But what happens underneath? This article thrives to give you some deeper insight in compilers, computer architectures, assembly language and the mysterious machine code. After reading the article you will feel more comfortable using compilers and talking about machine code, instruction sets and architecture features.

Prerequisites

If you want to follow this article and reproduce the steps I do, install the SDCC compiler suite for Intel MCS51-like microcontrollers (e.g. 8051). In order to call the sdcc command from your shell, add the installation path to your PATH environment variable.

For writing code, you can use any plain text editor of your choice. In order to compile the program into the build folder run sdcc main.c -o ./build/.

You can execute the compiled code (.ihx) in the EdSim51 simulator. The Java-based tool emulates a 8051-like virtual CPU so you can go step for step over your instructions and observe the behavior of different registers. Furthermore there are some peripherals, like 7-segment displays if you want to write some fancy assembly programs.

In this article I will frequently refer to the original MCS51 datasheet, don’t hesitate to download the PDF file!

Now we are prepared for our journey, grab your coffee and let’s go!

Why 8051?

MCS81 family was developed by Intel in 1980 for use in embedded systems like vending machines or electric windows in your car. It is a lightweight 8-bit processor — 8-bit refers to the capacity of the arithmetic logical unit (ALU). Additionally, 8051 has various features like serial interface (UART), multiple timers, interrupts and the ability to connect to an external (outside the chip) data and code memory. For a more in-depth description, click here.

The second reason for selecting MCS51 is that the family’s architecture not as complex as modern CPUs with their multi-level caching and pipelining mechanisms. Understanding 8051 execution model will provide you at least a high level overview how modern CPUs work internally — it is a fascinating world connecting human logic written in a high level language with electrical nanometer-scale circuits.

Below you see the high level architecture of the MCS51 (screenshot from that datasheet):

A highly important concept is the concept of different memory types. In the 8051 we have the 4KB ROM block (code memory) and 128 Bytes RAM (data memory). The former stores your application code as a sequence of bytes — the latter is used for variables defined in your code, e.g. intermediate results of a complicated calculation.

(Modern CPUs follow the principle where both code and data can be stored in a single location, e.g. RAM or cache, see here, but for now we separate both worlds)

In a nutshell: the CPU sequentially reads the code memory (also called program memory) and executes different instructions like addition, data copy or bit shifting. The results are saved to RAM registers for further processing.

Let’s take a closer look how a fundamental add instruction is represented in code memory. A very common operation is to take a value from a particular address in data memory and add a second value to it. We will describe this process in assembly language:

First mov instruction copies a single byte from the data address _P1=0x90 to the accumulator, which is a special register within the CPU where all calculations take place. The consecutive add instruction adds a hexadecimal (negative) number 0xFB to the accumulator. The result is again copied from the accumulator to the RAM. The destination is r7 (register 7), which is a “special” memory location in RAM. Instructions involving registers take only half of the code memory space compared to similar instructions involving other arbitrary addresses in internal RAM. I.e. add A, r7 takes only 1 byte in ROM, whereas add A, 0xA7 needs 2.

If efficiency is your holy grail and the code space is limited, you think about that too.

Let’s decompose the three lines of assembly code from basic.asm snippet into machine code which a 8051 can interpret.

The table of available instructions can be found on page 2–21 in the MCS51 datasheet (table on the left).

The mov A, addr instruction has the instruction code (or opcode)E5 and takes 2 bytes in the code memory.

The add command is 2 byte long and has the opcode 24 .

The last mov is a quick and short (1 byte) operation with opcode FF.

Let’s translate the sequence of commands we just wrote into sequence of bytes in program memory: E59024FBFF or if you wish a even less readable binary 1110010110010000001001001111101111111111 .

You will find this sequence somewhere in the program memory of the CPU.

That was tough! Give your brain some minutes of rest to digest. Instructions and operands of those instructions (like target and destination addresses) are located in program memory. In EdSim51 you can toggle between data and program memory by clicking on the memory button:

Level up!

Writing assembly code is tough, even simple for-loops are a challenge to program them fail-safe and efficient. Efficiency here means taking the minimal amount of CPU cycles and memory for accomplishing an arbitrary task and that’s why higher-level languages like C exist — to make our life easier.

For the sake of the example we will use following dummy code:

We read the value at the port P1, subtract a 5 and assign the result back to P1. We do that 3 times in a loop before the program quits.

Running sdcc addition.c -o ./build/ in the terminal leads to a creation of multiple files within the build folder. In this tutorial we take a look at addition.asm and addition.ihx. The .asm file is the compiler’s “translation” from C to assembly operations. Let’s take a look what the compiler proposed:

8051 (distilled) assembly representation for addition.c code snippet. I deleted some compiler meta data for a better readability.

The logic within the for-loop is implemented in lines 27–29:

logic within the loop

Instead of using the subb subtract operation - the compiler "misused" the add operation and exploited the negative number representation implemented in most CPU architectures. Since Two's complement is used, the compiler can just rewrite x = x - 5 to x = -5 + x or x = 0xFB + x.

Since those three instructions shown above are located in code memory one after the other, we are able to translate them into machine code:

This instruction sequence has to be placed somewhere in the ROM. Let’s open the addition.ihx file (which describes program data flashed to ROM) and find the sequence E59024FBF590 in the line 4:

The compiler translates our C code into an interpretable byte-stream saved to (code-) memory / ROM. The CPU parses the byte sequence and executes the desired operation. Here you also can find the reason why code compiled on Windows can both run on AMD and Intel CPUs — even if the internal electrical circuit architecture of those CPUs is different — the instructions (like mov), their operation codes and their functionality are the same.

Processors which support same operation codes share a common instruction set (like x86 with AMD and Intel for PCs). In comparison to 8051, modern CPUs share code and data memory in one location (e.g. RAM). Since data and code are interchangeable, some security issues arise. You could potentially execute harmful code disguised as application data — the only requirement is to know some internals (like OS, hardware) about your target system.

Down to the silicon!

Now we’ll take a step further and observe what happens on the CPU after we flashed our machine code to the program memory and pressed the reset button (causing a rising edge at RST pin).

The program counter (PC) will parse the first instruction stored in program memory (at the address 0x00). If E590 is the first command, the control unit will copy the content of port 1 (accessible at address 0x90) to the accumulator.

Parsing and executing mov a, 0x90 instruction, high level overview

The same process in little more depth:

After reset, the program counter (PC) is set to 0x00
Apply the value of PC to the program address register (PAR). The PAR reveals the byte stored at 0x00 in program memory to the address bus.
Control unit stores the instruction code in instruction register.
Now the control unit is aware that for parsing the E5 instruction it needs an additional byte from program memory. The next byte in ROM is 90.
Move the content of 0x90 to the accumulator.

The mapping “instruction-code → particular action” is implemented in hardware as a special circuit within the timing-and-control unit. It sets the right bits to access memory or execute a particular operation using the ALU module.

Roundup

Coding can be fun and it’s great to be able to write applications in Python or C without thinking about the hardware specifics. Imagine you would have to rewrite your logic in CPU specific instructions! Luckily nowadays, given the target compiler, you can run any arbitrary C code on various CPU architectures or manufacturers and standardization of the instruction sets (like x86 or RISC-V) allows to distribute binaries across different devices.

I hope you have learned something new. Even if you don’t need this knowledge in your daily (coding) life, same as I don’t, it is sometimes useful to understand what happens underneath your keyboard.

Special thanks

Thanks to Anna F. for improving my English skills and reviewing my text. Also big thanks to Christian H. for his technical review.