How Does a Modern Microprocessor Work?
A look at a made up RISC microprocessor called Calcutron-33, to learn how a microprocessor works.
Are you interested in better understanding articles written about new microprocessors such as Apple’s M1, AMD Ryzen, or even RISC-V?
- Why is one microprocessor faster than another?
- Why does another consume a lot of power?
- What Does RISC and CISC Mean in 2020?
- ARM, x86 and RISC-V Microprocessors Compared
To be able to read articles answering such questions it helps understanding what a microprocessor does. What an instruction is, a decoder, a register, an instruction-set architecture and so on.
And no, this is not nearly as hard as it may sound. I am going to teach you the assembly language of a very simple but modern microprocessor called Calcutron-33. What is assembly language? Microprocessors (CPU) cannot be instructed in plain English. They need to be told what to do using what is called machine language. Assembly code is how humans write machine code.
Before you go: “I have never even heard about Calcutron-33! Why don’t you teach me something useful like Intel x86 or ARM?”
I definitely would have if that was easy. However a modern x86 microprocessor from Intel or AMD has over 1503 different assembly instructions. Internally they have a myriad of advance features, which means the fundamentals of what a CPU does completely drowns in layer upon layer of complexity.
Thus Calcutron-33 is a much better choice because it allows us to focus on what is important. The CPU only has 10 different instructions to learn. These have been carefully selected so we captured the essence of what a modern RISC-like processor is about.
Once you know Calcutron-33, it will be much easier to read and learn about real microprocessors. You don’t want to learn everything about them, but by knowing the basics, you understand better what to focus on and what to ignore.
There are other imaginary CPUs to learn such as Little Man Computer, but Calcutron-33 has been inspired by modern CPU designs such as RISC-V. I believe this is far more relevant in 2020.
Looking at the inside of a CPU (Central Processing Unit) with a microscope does’t tell you much about how it works. It is not like a mechanical device with gears, pipes or things we humans can easily grasp by merely looking at them.
To reason about the workings of a CPU, it helps to imagine its as organized like a large office, warehouse or factory. It has different functional units doing different tasks. For instance in a factory you have places where car parts are welded, painted or assembled.
A large office is also organized in this manner. To imagine a CPU it actually helps to think more about how a office in the 1940s or earlier worked. Large complicated scientific or business calculations could not be performed by computers because they did not exist. Instead a complicated task would be subdivided into simpler subtasks that human computers could calculate.
Yes, that is a fun little factoid. A computer was originally a human paid to perform calculations.
A microprocessor is similar. It is made up of different functional units which each can do one simple task. Like an office and unlike a factory, a microprocessor is about processing information, not assembly of physical parts.
Below is a diagram of a simple microprocessor. I am going to walk you through how to read and understand this diagram.
The colored arrows are what we call busses. They are the transport lanes within the CPU. If you think about a factory, much of what happens inside is simply transportation of partially finished components from one functional unit to another. That could happen by trolleys, fork lifts or conveyer belts. You could think of each of these blue, green and red lines as conveyer belts shipping stuff around the CPU. Except they are moving numbers or control signals.
The gray boxes, such as the Decoder, ALU and Program Counter, are functional units which carry out specialized tasks. You could think of them as a special office or desk. On each of these desks an employee specialize in carrying out one narrow and specialized task, such as adding two numbers received.
When done, each functional unit will usually produce some output which is transmitted along one of the bus lanes to another functional unit for further processing, or memory for storage.
Now that you got a birds-eye view, let us look more in detail about the individual parts.
Reading and Writing to Memory
For the Calcutron-33 computer you can think of memory as a wall of numbered mailboxes, seen in the picture below. Each mailbox can contain only one number which may be a maximum of 4 digits. Hence
9999 is the largest number you can store in any mailbox. Every mailbox has an address. The address is a two-digit number (not 3 as in picture), which identifies that particular mailbox.
What does this analogy have to do with a real computer? Computer memory or RAM (Random Access Memory) is also divided into multiple memory cells which can hold numbers of some maximum number of digits. The maximum value will depend on the number system used. For instance a 4-digit decimal number can hold much larger number values, than a 4-digit binary number (4-bit number).
Decimal numbers have the digits 0–9. A digital computer in contrast use binary digits 0–1. That is because this is easy to represent in electrical terms. A 0 would be low voltage and 1 would mean high voltage. To not be overly pedantic most chips specify low and high as some sensible range. 0 to 1.5 volts for low is common. 3.5 to to 5 volts commonly means high.
The curious can read more: Logic Levels.
The largest number we can hold with four decimal digits is 9999 (10⁴ -1). For a binary number the largest four digit number is 1111 Remember the max digit value is 1. Translated into the decimal number system, this is 15 (2⁴-1).
Learn more about other number systems: Teaching Kids About Alternative Number Bases.
Anyway you don’t need to care about the binary number system, since the Calcutron-33 operates on decimal numbers just like humans. It is able to do that, because it is an imaginary computer. Although computers using decimal number systems have existed but they where mechanical rather than electrical.
The RAM (Random Access Memory) of a computer also identifies every memory cell with a unique number, the address.
When you launch a program on your computer, it gets loaded into memory. It consists of a series of instructions. Running a program involves fetching each instruction in the program in sequence and performing that instruction.
Registers — Memory Cells Used in Calculations
On the diagram you can see a Program Counter, Instruction Register and a Registers box. All of these are registers. Actually the Registers box is not a single register but a collection of 9 different general purpose registers.
So what is a register and why are they important? Registers are simply memory cells which can hold a number. What differentiate them from memory cells in main memory is that you can use them in operations. For instance if you want to perform an addition, the numbers being added, have to be found in a register. They cannot be in main memory.
Registers are not an entirely new idea. The concept existed for mechanical calculators as well. A register is simply a collection of digits which could be used in an operation. For instance on the mechanical calculator below, one pulled on levers to specify an input number. This formed the Input Register. Once you cranked the handle on the right, that number would be added to the accumulator register at the bottom.
Once we have looked at how data is pulled from memory into registers, we can look at how registers are used in calculations.
How a Program is Read From Memory
The Program Counter (PC) is a register containing a two digit number. It is set to address of the first instruction of the loaded program. This number is transported along the green bus, the address bus, to Memory. That gets the memory cell with this address selected. The number in the selected memory cell is transported along the blue line to the Instruction Register.
We can think of this as a guy picking up a note with the address of a mailbox in a shelf labeled Program Counter. Then he walks over to the mailboxes. Locate the mailbox with the two-digit number on his note. He copies the number he finds onto a piece of paper and brings that over to another shelf labeled Instruction Register.
Of course in reality there is no “guy” walking around. Instead there is a series of copper traces carrying electrical signals from one location of the CPU to another.
How do we know where this number should go? Indeed, because if we look at the whole diagram, you would have noticed that the blue data bus is connected to lots of different functional units (gray boxes). Electric current doesn’t have a brain. It cannot know where it is supposed to go. We cannot explain that until we get to the gray box labeled Decoder.
After this whole sequence of events are carried out, the Program Counter will get incremented. Thus if our program started at address 3, we will increment the Program Counter to 4 and fetch the next instruction in memory.
Input and Output
Reading input from the user or other sources is done in a very similar fashion to reading data from memory cells. Input can be things such as a keyboard, mouse, network cable or hard drive. Usually inputs are deal with as if they where memory locations. With Calcutron-33, addresses from 90 to 99 represent input and output devices and not memory locations. But this will become clearer when we look at the assembly instructions.
The instruction read from memory is just a 4-digit number. It must be interpreted to carry out a task. This is done by the Decoder. The instruction is sent to the decoder through the blue data bus.
Once the decoder as decoded an instruction it will activate and de-activate one or more of its outgoing read control lines.
These are the missing part of the puzzle, explaining how it is determined where numbers flow on the blue data bus. An analogy I think helps in the case is to think about the blue data bus lines as a pipes with water or pressurized gas flowing through them.
The red control lines are electrical wires that toggle on and off values at the openings of each functional unit. Thus when the next step is to read the next program instruction, it is the Decoder which opens the valve to the instruction register. We could perhaps think of the memory as a gas tank. The Decoder opens a valve on the Memory tank. Thus pressurized gas will flow from the “Memory tank” and into the “Instruction Register tank.” It will not flow in anywhere else because all the other tank inputs valves have been closed.
In reality it is electrons flowing through copper traces, and not gasses. And electrons are not directed into specific functional units due to values. Rather electronic components serving the same purpose called multiplexers are used.
For simplicity I have not drawn multiplexers in my diagram, but they are typically all over the place and you will find them in many CPU diagrams.
Arithmetic Logic Unit (ALU)
Of course we cannot just move numbers around. We got to actually do something with them. The Arithmetic Logic Unit (ALU) is the calculator of the microprocessor (CPU).
But it is a pretty dumb calculator. It is more similar to the mechanical calculators of old than a modern calculator. It only really knows how to perform additions, subtractions and perform shift operations.
What is a shift operation? Shifting is to add or remove digits from a number. A left shift of 40 gives 400. Two right shifts of 400 gives 4. Basically it is just multiplying or dividing by 10. Or rather it depends on the number base you use. For a binary computer shifts are multiplying or dividing by two.
These operations are relatively easy to build electronically with transistors as well as mechanically with gears. No mechanical calculator was ever able to perform multiplication or division directly. Rather multiplication was performed by repeated additions. Division by repeated subtractions.
Early microprocessors worked the same way. Calcutron-33 works this way. Advanced modern microprocessors have special functional units called multipliers which perform multiplication. A lot of early electronic calculators likely performed multiplication too by running small programs that performed repeated additions.
Anyway let us clarify how this ALU works. Unlike a calculator there is no human which can specify the inputs by pressing buttons. Instead we use registers. There are 9 registers named
x9. This is pretty common naming conventions on a lot of real CPUs.
The decoder uses its control lines to activate two of these registers which become the inputs to the ALU. The decoder also use its control lines to select what function the ALU should perform, whether it should be an addition, subtraction or shift.
It also opens one register for input, so that the result from the calculation can flow back to the Register File. It is common to call the collection of registers from
x9 for Register File. Don't mistake this for file in a filesystem. The two terms are unrelated.
Okay now we know how the various functional units interact. It is time to look at how we write instructions to use these functional units in meaningful ways.
Machine Code and Assembly Code
If the decoder decodes the instruction
8243 it will cause it to load the number stored in memory address
43 into register
1123 in contrast will cause the contents of register
x3 to be added and the result stored in register
These numbers are on machine code form. They are just numbers which is fine for the computer, but not very practical for us humans. Thus it is common use human readable abbreviations called mnemonics. Assembly code is written using mnemonics. In assembly code the previous instructions become:
LD x2, 43
ADD x1, x2, x3
Machine code instructions are usually encoded in particular ways. Especially for RISC CPUs one tries to have a very regular pattern in how this is done.
Instruction Set Architecture (ISA) vs Micro Architecture
Instructions such as
ADD as well as the available registers, how they are encoded in machine code format, number of digits in a register and so on are all part of what we call the Instruction Set Architecture (ISA) of a microprocessor. So if a CPU had all the same instructions as Calcutron-33, but say 4 registers or 40 registers instead of the nine Calcutron-33 has then it wouldn’t be be the same ISA. Likewise if the number of registers and the instructions where the same but memory cells contains 10-digit numbers instead of 4, then it would also be different. An ISA is basically a contract between the programmer and the microprocessor makers. It specifies the exact behavior of a microprocessor seen from the perspective of a software developer.
However internally a microprocessor can be quite different. What the busses are like internally, how many decoders, arithmetic logic units it has and so on is part of what we call the Micro Architecture. This is invisible to the programmer, but can affect the performance of the microprocessor.
Let me pick a couple of examples. Both AMD and Intel make microprocessors with the same ISA called x86–64. However their internal micro architecture is entirely different. But this doesn’t matter to programmers. A program written for an Intel processor will run just as well on an AMD processor. There may simply be some performance differences.
Calcutron-33 Machine Code Format
- Every instruction is 4 decimal digits.
- The first digit is the opcode, which says what the instruction does such as add, subtract or load.
- The second digit is a register operand. Usually the destination register for whatever operation is performed.
- The last two digits will vary in meaning depending on opcode. For arithmetic operations, they will usually be two registers, used as input. The last one may be an immediate value from 0 to 9. For branches, store and load instructions , the last two digits will be a memory address.
Assembly Instruction Set
Once I have explained the assembly instructions, we can actually start to write some small programs that do useful things.
I am going to give a list of assembly mnemonics, their operands (the inputs they need) and how assembly instructions get encoded into machine code.
If the first digit is a 1, then we got an add instruction. If the first digit is a 2, we got a subtraction instruction and so on. To get a more compact overview of how these instructions work, I am going to use some naming conventions which are very common in microprocessor manuals.
rd is not the name of an actual register on the CPU, but simply refers to some register from
r2 which is the destination register for the operation performed. For an addition that means, the result is stored in this register.
rt means source registers. These are the inputs to an operation. Don't worry this will be clear with a simple example.
ADD — Addition
ADD rd, rs, rt | 1dst | rd ← rs + rt
What this described is that the result of the operation is stored in the first specified register. The
rs register is the 3rd digit and the
rt register is the 4th digit.
This is a description of how the assembly code instruction is mapped into machine code. We see that the first digit must be a
1 for this to be an
ADD instruction. The letters
t are place holders to show which digit is the
rd ← rs + rt
This is a description of how the operation works. The arrow
← points to where the result gets stored. Let us do a simple example to make sure you understood the notation:
ADD x2, x4, x8 ; x2 ← x4 + x8
The colon indicates a comment, and everything after it can be ignored. In this case we destination register
x2 and the two source register
x8. We add the numbers in
x8 and store the result in
SUB — Subtraction
rt from register
rs and store result in register
SUB rd, rs, rt | 2dst | rd ← rs - rt
SUBI — Immediate Subtraction
k from register
rs and store result in register
SUBI rd, rs, k | 2dsk | rd ← rs - k
LSH — Left Shift
k digits and store in
rd. Left shift means to add a zero digit. So 3 left shifted two digits is 300.
LSH rd, rs, k | 4dsk |
Let us to an example where we assume x2 contains the number 12.
LSH x1, x2, 2
x1 will now contain the number 1200.
RSH — Right Shift
k digits and store in
rd. To shift right means to chop of digits at the end. 400 shifted right two digits is 4. If shifted only one digit, it is 40.
RSH rd, rs, k | 5dsk |
BRZ — Branch Zero
jump to address
aa if register
rd is zero.
BRZ rd, aa | 6daa |
BGT — Branch Greater Than
jump to address
aa if register
rd > 0 (positive).
BGT rd, aa | 7daa |
LD — Load Register
rd with contents of memory at address
LD rd, aa | 8daa |
Let us do an example. of loading the contents of memory location 12 into register
LD x2, 12
Basically our CPU goes to “mailbox” labeled 12, finds the number inside and puts it into register
ST — Store Register
rs in memory at address
ST rs, aa | 9saa |
HLT — Halt Program
HLT | 0000
Pseudo Assembly Instructions
For almost all microprocessors you will find what we call pseudo assembly instructions. These are instructions which don’t really exist in the instruction-set architecture (ISA) of a CPU. Rather they are convenient shorthands for other instructions.
For instance if we wanted to move the contents of register x2 to x1 we could write:
ADD x1, x2, x0
This may seem surprising as I have not mentioned register
x0 before. The reason is because it doesn’t really exist as an actual register. Register can be used as an operand anywhere where a register is expected, however
x0 will always be zero, and any number moved to it will be discarded.
This is a popular trick on several RISC processor architectures such as MIPS, ARM A64 and RISC-V. Because this is so useful the Calcutron-33 assembler has a shorthand for this:
MOV x1, x2
However it is not a real instruction since the machine code is the same as for
Let us look at the other pseudo instructions:
MOV rd, rs │ ADD rd, rs, x0 │ rd ← rs
CLR rd │ ADD rd, x0, x0 │ rd ← 0
DEC rd │ SUBI rd, rd, 1 │ rd ← rd + 1
BRA aa │ BRZ x0, aa │
Writing Programs and Running Programs
Programs are read by the Calcutron-33 from start to finish but certain parts of the code can get repeated many times because one performs jumps from one line in the program back to a previous line.
However programs end when the reach the
HLT instruction or we attempt to read from input and there is no more data to read. We can mark places in the code we can jump to with labels. Labels are written using the format:
mylabel:. Let us look at a very simple program to demonstrate this:
LD x1, 90
LD x2, 90
ADD x1, x2
ST x1, 91
This program continuously reads two numbers from input and stores them in register
x2. These are added and result is written to output. We could e.g. assume that on address 90 we have connected a puched taped reader. You can see an example of a modern version of this below. In our imaginary computer we assume that every time a number is read with
LD, the tape advances to the next number. Hence we never read the same number twice despite reading from different memory addresses.
We use the different addresses to distinguish between different devices. E.g. the tape reader is at address 90, while and LED display for numbers is on address 91.
The line at the bottom of the program, which says
BRA loop means we jump to the label marked
loop which means the whole program starts from the beginning.
NOTE: There will be more information about how to actually run Calcutron-33 programs here in the future. This story is still a bit work in progress. My apologies in advance.
Here is another simple example which repeatedly doubles its inputs.
LD x1, 90
ADD x1, x1
ST x1, 91
Multiply By Eight
Multiply each input by 8. This is the first step to understand how addition can be used for multiplication. You can see that multiplication is really about just controlling how many times you are adding.
LD x1, 90
ADD x1, x1
ADD x1, x1
ADD x1, x1
ST x1, 91
Here is a bit more complex program where we are using conditional branching more actively. What this program does it that it read two inputs into
x2 and output the larger of the two values.
LD x1, 90
LD x2, 90
SUB x1, x1, x2
BGT x1, first
ST x2, 91
In this case we perform uses
SUB so that
x1 ← x1 - x2. The following
BGT x1, first instruction, is basically doing the equivalent of:
if x1 > 0 goto first
x1 > x2 then
x1 - x2 > 0, so this is logical. So depending on whether the first or last number if bigger we goto label
second and output result. Afterwards we jump back to the beginning with
Early microprocessors did not have any instructions for multiplying, just like Calcutron-33. But that was not a problem because you can implement multiplication through repeated additions.
The code below shows a simple way of doing this. The numbers to multiply are placed in
x2. Say we multiple 4 with 3. That means adding 4 three times. The way we do this is by subtracting 1 from
x2 on each repetition. This is to keep track of how many times we need to add
x1 to the final result stored in
We repeatedly jump back to the label
multiply to keep adding. This is done by using
BGT (Branch Greater Than) to check of
x2 is greater than zero. As long as it is we keep jumping back to
multiply. Since x2 is reduced by one on each repetition we it will eventually be 0 at which point we use
ST to write out the result.
LD x1, 90
LD x2, 90
ADD x3, x1
BGT x2, multiply
ST x3, 91
However this approach is not very efficient. Imagine multiplying 90 with 90. You would have to perform 90 additions.
Fortunately there is a much faster approach using a combination of additions and shifts. Shift for a decimal computer is multiplication or divisions by 10. In a binary computer it is multiplications or divisions by 2. Shifts are always really fast to perform on a computer.
The core idea is that you can represent any multiplication as a combination of shifts and additions. Let me do a few simple examples.
30 × 40 = 300 + 300 + 300 + 300 = 1200
Here we shifted 30 left once and added it 4 times.
30 × 23 = 30 + 30 + 30 + 300 + 300 = 690
First we added up the ones 3 times. Then we added up the tens twice. To be able to add tens, we do a single left shift.
If you look at the code below you will see both left and right shifts. The right shifts are used to peel off individual digits. E.g. when you do
30 × 40, you need to separate out the 4 digit to keep track of how many times we add 30. If you right shift 40, you get the 4 digit isolated.
LD x2, 90 ; First number to multiply.
LD x3, 90 ; Second number. Treated as counter
CLR x1 ; accumulator for result. Clear it out.
RSH x4, x3, 1 ; Push right most digit of x3 into x4
ADD x1, x1, x2 ; Add first input to accumulator
DEC x4 ; Decrement counter for number of additions
BGT x4, multiply ; Repeat while x4 > 0
LSH x2, x2, 1 ; Left shift. x2 made 10x larger
BGT x3, nextdigit ; check if all digits have been processed
ST x1, 91
What Have You Learned?
Let us do a kind of summary because this whole story is quite long and I don’t expect you to remember all of this. It is useful however to grasp what is of most fundamental importance here.
The key things I wanted you do get is that:
- A CPU does lots of small really simple instructions. It is only stuff like moving numbers from one location to another, adding them and comparing them. Then there are control flow, such as jumping to other parts of the program to repeat things.
- The code that the CPU understands is just a long stream of numbers. Assembly code is a just a human readable form of this.
How are Real Microprocessor Different?
While other CPU architectures may seem significantly more complicated a lot of this derives from providing faster ways of what you can already achieve with this super simple CPU. You can already perform multiplication by just writing program for it.
However real modern CPUs have special instructions for performing multiplication. Our CPU doesn’t have a way of multiply floating point numbers such as 4.25 or 58.32. Again real CPUs have special functional units for this. So instead of just the ALU, there are several other functional units, such a a multiplier or FPU for floating point arithmetic.
And of course in real processors instructions are not 4 digit decimal numbers but typically 32-bit binary numbers. That means multiple bits will have to be used to encode things such as the source and destination register of an operation.
Real CPUs also have more convoluted ways of reading things from memory. Memory exists in multiple stages. You got very fast memory called L1 cache. Then slightly slower memory called L2 cache, then even slower memory called L3 cache and finally you get to main memory which is the slowest.
When the CPU gets something from memory it puts it in the cache, so that next time it is needed it can look in the cache first to see if it is already there. But because cache is small, stuff that hasn’t been used in a long time may have gotten booted out.