What is an exe file anyway?
I’m sure all of us have opened an exe file at least once in our lives, but have you ever asked yourself what it is? The more curious of you might have tried to open it in notepad only to be greeted by this screen:
Scary isn’t it? Well, today we’ll see what all of this gibberish means. To do so, we first have to get accustomed to what assembly actually is.
The assembly language
Those of you who are programmers might know what assembly is, but probably never bothered with it. And probably for a good reason! But wait, why is assembly so important to understand executables? Well, they’re kind of the same thing.
You might have heard that computers are only made of 1s and 0s, and that’s kinda true. Based on the order of the 1s and 0s it reads it performs what are called “instructions”. These are very simple commands like “add two numbers” or “move this here”. Each of these instructions has a corresponding assembly instruction that gets converted into binary.
Let’s look at a basic assembly program to understand better. In order to better show how it works, I will use a simple assembly language instead of using x86 assembly. The assembly I’ll be using for the program is 6502 assembly, an old yet elegant language that is very easy to follow. Here is a program to add 2 numbers:
LDA #$5
LDX #$3
STA $AABB
STX $CCDD
LDA #$0
ADC $AABB
ADC $CCDD
Let’s go through this step by step. First we call these LD instructions, which basically load a value into A and X respectively. What are A and X you may ask? They are registers. In case you haven’t done your computer architecture, registers are very small pieces of memory that are inside of the CPU. They still exist in CPUs today, but while the 6502 only has 3 a modern intel CPU has 16 of them. So we load the number 5 in A and 3 in X, then proceed with ST.
ST means STORE, so we store the contents of A at address AABB and the contents of X at CCDD. An address basically is a location in memory that works like an home address: think of ram as a city with many streets, each street has an address that corresponds to a house. Its the same with computers, but instead of homes we have bytes. What we do here is store the numbers 5 and 3 at the houses in address AABB and CCDD. We reset A to 0 before performing the addition.
Lastly, we ADD the contents of the addresses AABB and CCDD together and store them back indo A, which will now hold the value 8. Neat uh?
Now all we have to do is compile the program, try to open it and… it doesn’t run. As I said this is for the 6502 CPU which is not compatible with modern computers. Well, let’s try to open it with notepad and see how it looks!
Binaries
Still not very pretty, isn’t it? It’s because notepad is trying to read it as if it was text, when they actually are numbers. Remember when I said that everything in a PC is 0s and 1s? It is also true with text! Each combination of numbers has a corresponding character, and this is the result of putting together seemingly random numbers in a file.
To properly read it we need a program called an hex editor which you can also find online. If we take a look now it is much more comprehensible:
If we were to feed this file to a 6502 CPU, it would execute it just as we execute our exe files. Now what do these numbers mean? Let’s look at an opcode table. These numbers are called opcodes and are the instructions that we talked about earlier. Even without knowing how these works you might recognize some familiar numbers: 05, 03, BB, AA. We wrote these in our program earlier!
Now, an opcode table is a big list of opcodes and their corresponding instructions. If we look for the first number, A9, we find:
Would you look at that! It is the LDA instruction. So A9 followed by a number x means load x into a. Now next instruction
And again, we have the LDX 3 from our program! Now I feel like you know where this is going.
In short, exe files are HUGE versions of this little program that contain literally millions of these opcodes for your computer to execute. Well, now you know what an exe file is!