ASSEMBLY

Ridadogrul
13 min readMay 8, 2023

--

Contents
Introduction to Assembly
Registers and Components of the CPU
Bitwise Operations
Assembly Mnemonics
Sign Extension
Mul and Div
Conditionals
Shift and Test
Endianess
The Stack
Functions
Enter and Leave
Arrays
Load Effective Address (LEA)
Buffer Overflows

INTRODUCTION TO ASSEMBLY
How does a processor work?
Begin with logic gates: AND, NOT, OR
Each gate receives binary input: 1 or 0
We have logical operations: 1AND 1 = 1, …
Construct components for adding binary numbers
With logic gates, we can create an Adder
Adders are digital circuits used to perform addition
Half-Adder: two input signals, outputs carry and sum
Full-Adder: adds binary numbers and accounts for values carried in and out.
You can implement multiplication and division and shifts
Now, have a variety of ops acting on 1 or more inputs
Division, mult, addition, subtraction, logical AND, logical OR, logical XOR, shift left, shift right.
Now we want to use them
Bring in inputs and perform operations on them
CPU has registers
Encode operations in a bit pattern
0001 = perform NOT
0010 = perform AND
0100 = perform Addition

For example, we want to add 6 and 2
Send it instruction for add: 0100
Send it 6 and 2 in binary: 0110 and 0010
So: 0100 0110 0010

Now we have a sequence of operations
This is machine language and it is hard! So we develop assembly languages
Assembly compiles to machine language
Have mnemonics that map to machine code
ADD EAX,2
ADD 6 2

How to Program
Processor uses a operation code (OPCODE)
Tell CPU what
Binary prefix
ADD
SUB
MUL
MOV
Each clock increments the Program Counter
Load next instuction — figure out opcode&data
Repeat
What is Assembly?
Second lowest level language
Written for a spesific architecture
İ386
ARM
SPARC
Cell Processor(PS3)
Basic commands
Assembler translates to binary native code
Linker loads library
We’ll use compiler to link our code
Different Assemblers
MASM — Microsoft Macro Assembler
TASM — Turbo Assembler
NASM — This is what we’ll use — Netwide Assembler
GNU Assembler
Why Program in Assembly
Helps you to understand the lowest level architecture
Lets you
Write fast efficient code
Device Drivers
Embedded System
Real Time Systems
Compiler
Reverse Engineering
Hacking shell code :D
Architecture
Level 1
Based on electrical properties
May be modified by Quantum computing
Level 2
Defined by the processor makers
INTEL, SPARC, ARM
Level 3
Assembly instructions get transformed into ISA instructions
Data Representation
Computers only store bits, 1 and 0
When programming in assembly you need to understand how to converst to Hex and 2’s complement
Binary (base 2)
Each bit is 2^n in decimal
To go from decimal to binary use division methode
1 byte = 8 bits
Hexadecimal (base 16)
0–9 A-F
Values from 0–15
More compact than binary
1 byte — convenient
2 hex digits
If multi-bytes use base 16 multipliers
N x 16^M / A0
10 x 16² + 0 x 16¹
2’s Complement
Convenient way to add and subtract numbers
Uppermost bit (MSB)=0, then it is positive binary number
Uppermost bit=1, then it is negative
The algorithm
Reverse bits
Add 1
Characters
ASCII
8 bits for characters
For letters/numbers MSB=0
Assembly can convert from char to byte
http://www.ascii-code.com/
UTF-8
UTF-16
Unicode / UTF-32

REGISTERS AND COMPONENTS OF THE CPU
Computer Organization
Memory
It all starts with the bit- 1 or 0
Basic unit of memory is the byte
8 bits- minimum amount of data to move
ASCII chars take up 1 byte
A way to represent data
Unicode uses 1, 2, 4 bytes
Assembly units
Word 2 bytes
Double word 4 bytes
Quadword 8 bytes
CPU
Executes machine code
Fetch
Decode
Execute
Store result
Computer
Consist of registers
Flags- something happend
Memory address
IO function
Write data
Input data
USB, HDDs, Monitor, Keyboard, Wireless, Ethernet
Families x86
8088, 8086
16 bit register
AX, BX, CX, DX, SI, DI, BP, SP, CS, DS, SS, ES, IP
80286
Protected mode-OS runs in protected mode
Program doesn’t — it can’t access other process space
80386
added 32 bit mode
extends the registers: EAX, EBX, —

80486
faster
MMX
new instructions for multimedia.
Registers
Very high speed memory-clock speed
2.6 GHz= 2.6 billion ops per second
Very little memory
The larger registers are broken into multiple ports.
EAX = 32 bits
Ax = Low 16 bits of EAX
AL = Low 8 bits of AX or EAX
AH = High 6 bits of AX
General Purpose
EAR, EBA, ECX, EDX

Register Usage
EAX (accumulator /data and returns)
ECX Counter -For/While loop Counters
ESP (extended stack pointer)
ESI (source), EDI (destination)
data transfer
EBP frame pointer (when we use the stack) — local vars and arguments.
EDX: EAX mul and divide
32 bit x 32 bit = Up to 64 bit number

IP Register
IP/EIP — Control execution
Points to the current instruction to be executed
Cannot be modified directly
Increments after instruction is executed
Exceptions: JMP, etc
Used to mean control of the program for security /exploitation

Flags
Store info about a previous instruction that was executed
+ Carry (CF)
+ Overflow (OF)
+ Sign Flag (SF)
+ Zero Flag (ZF)
+…
Segment Registers
Segments are used to store info about where elements are located
CS code segment, read only
DS data segment, read/write
SS stack segment
ES, FS, GS extra segments

Virtual Memory
Process of mapping memory addresses used by a program(virtual addresses) to physical address in computer memory.
Appears to process as a contiguous block of memory
OS manages mapping of virtual to real address space
OS’s segment RAM into page
Load pages into and out of RAM on demand
Reduces
Load time
Amount of RAM used by program
Slows
Access to RAM many require page faults
>> delay (lazily) when pages are loaded
Benefits
Mmeory isolation(security)
Use more memory than physically able(paging)
Applications do not have to manage shared memory space
Can help prevent relative addressing (more to come)

When process accesses memory, consults a page table
Tells process which physical address to use
Don’t map each byte, instead use pages
Processor can enforce rules on how memory is accessed
We have sections in the PE because processor needs to treat modules differently
At load time, memory manager sets access rights: r, w, x
Permissions come from settings in section header
Sections typically start on a fresh page

Paging
Memory management scheme.
Allows for retrieval of data from secondary storage (HDD) for use. in primary storage (RAM).
OS retrieves data from secondary storage in same-size blocks-pages.
This allows for contiguous virtual address spaceand non contiguous physical address space Page Faults
Program attempts to access page not in RAM
OS takes over and loads the desired page -expensive
May need to write a page to secondary storage.
Not all memory is loaded or page is swapped to disk.
4096 byte block
Access causes paGe fault
process is suspended
page is loaded
process can be resumed
Is a type of interrupt

Interrupts
Interrupt the flow of the program
hardware •keyboard, Timer (clock), disk, network
software •illegal memory access • timer: tasks per core
errors (called traps) •div by zero • illegal instruction: doesn’t exist or you don’t have perm • illegal memory access (ie seg fault)

BITWISE OPERATIONS
Bitwise Operations
Bitwise AND — They have to be a zero to produce a 1.
Bitwise OR — If one is a 1, then it produces a 1
Bitwise XOR — Bits must be different to generate a 1. They are exclusively different. Used for encryption
Bitwise NOT — Flip eaach bit-negate each output
Test — Performs an AND operation but ONLY sets the flags. Does not destroy register data.

x86 Processor Design

Arithmetic logic unit (ALU)
This does the adding, subtracting, etc.
Floating point unit
Floating point notation — only accurate to a certain degree
Data bus
FPU talk to register, etc.
Registers
Small, fast memory
Clock
Cycle is the smallest amount of time to execute an instruction (some take many more)

Branch prediction
Try to guess which branch a program will take
Fetch, Decode, Execute, Store, Pipeline

Instruction Execution Cycle
Fetch
Decode
load from (order of magnitude slower)
cache L1, L2, L3
main memory (RAM)
hard disk
Internet
Execute
Store
Programs
Bytes stored on a disk
Os searches the path.
if program exists.
load into RAM
create process with PID
execute process (thinks it is unique)
OS handles resources
Disk I0, keyboard input, display, task switching

Task Switch
OS switches rapidly between processes
processes may be waiting
• disk • network •user input
high pricity processes preempt low priority
Save context-registers, flags, state of files
load new context — same as above
Commence running

RISC vs CISC
Reduced Instruction Set
Smaller op codes
More Instructions
Power PC now it is RAM
ARM is mobile devices, power matters!

Complex Instruction Set
Larger op codes
Fewer instruction
x86- this uses a lot of power
Less efficient than ARM

ASSEMBLY MNEMONICS

Assembly Mnemonics

Machine Language
Not very readable
Changes with different processors
Example: To add EAX to EBX and store result in EAX
03C3
Or you could write in binary:
0 3 C 3
0000 0011 1100 0011

Built-it yourself:
No abstractions
No classes
No data structures
No funcrions
No loops
No if statements
No variables

MIPS R2000 Assembly Language
Registers
Operations
Memory
System calls

Benefits of Assembly Mnemonics
Same instruction
Add eax, ebx
More readable
higher level
can remanin the same for different machines
First aprtion called mnemonic
Followed by operands
Mnemonic operand(s)

Assembler Vs Compiler
Assembler
Aproximately 1 to 1 assembly to machine code
Machine dependent
X86, ARM, SPARC, X64
Compiler
1 high level lines generates many machine / assembly instructions
More portable -C abstracts details & will run on different platforms
İnt x = x+7; //Generates several lines pf ASM
Will use both

Netwide Assembler
NASM for short
We will be using GCC nad C to create our programs

Operands
We’ll typically modify a rgister or a memory location
Register
Memeory location
Immediate (hard coded) — value in code
Implied
Increment affects the register

Instructions
Examples:

mov ebx, 7
;moves 7 into the ebx register
;ebx and 7 are operands

mov eax, ebx
;move the value in ebx to eax
inc eax
;increment eax
; eax ++

dec eax
;decrement eax

→ common format

[label:] mnemonic [operand(s)] [; comment]

mnemonic’s
mov
add
sub
mul
jmp
call

Comment
semicolon till the end of the line
; this is a nasm comment

Directives

Not actual instructions
Tell the assembler something to do
Set the size of the stack.
define memory
define constants.

% define

like #define in C

% define SIZE 10

move eax, SIZE; move 10 to eax
Same as

move eax, 10; move 10 to eax

Data directives
We’ll use these frequently

db define byte=> 8 bits
Smallest amount of space you can allocate
Ascii char

dw
define word => 2 bytes or 16 bits
Unicode,two chars, etc

dd
Define duble word => 4 bytes or 32 bits
Pointers in a 32-bit machine
dq

Quad-Word: 64 bits

Identifiers
Used for vars, constants, proceduces or labels
NASM is Case sensitive

Addition and Subtraction
Sample program
Mov eax, 100h
Add eax, 400h
Sub eax, 200h
Dump_regs 1
Listing
Gives information such as
Offset information
Binary code commands
Useful when finding errors

When functions are called
Function addresses are filled in at a later time
Either statically or dynamically linked
Statically linked, you’ll see an address
Dynamically, it uses a table for lookup

Viev data in executable
Prints out the actual data in a file
Uses hex by default
Also converts back to binary

SIGN EXTENSION

Changing Sizes
Easy to make something smaller
Convert word to byte. Romeve the upper 8 bits

Increase Size
Have to worry about sign
If it is signed, want to keep the sign

movsx (sign extend)
mov bl, -7;F9
movsx eax, bl; fill F’s as top bit is 1
F= 1111

movsx ax, bl same as above
mov bl, 7; 07
movsx eax, bl; fill0’s as top bit is 0

movzx (zero extend)
move zero extend when expanding, fill with 0’s

mov bl, -7;F9
movzx eax,bl;
eax= 0000 00F9

MUL AND DIV
Arithmetic
2’s complements allows both signed and unsigned addition and subtraction to work the same way
If we are using 16 bit registers the top bit is the carry
Carry flag will get set
If we using 32 bit registers
Top bit is set

Multiplication
mul
For unsigned numbers
imul
signed
Divide
div
unsigned division
idiv
signed division

div source
8 bit
(AX/ source) quotient AL remainder AH
16 bit
(DX:AX /source)q =AX rem =DX
32 bit
EDX:EAX / source q =EAX rem= EDX

CONDITIONALS
Control Structures
To make high level languages Turing Complete, we need to able to do selection

Compare
cmp reg1, reg2 / immed
Does subtraction
reg1-reg2
Result is NOT stored anywhere (use SUB instead)
Flags get set (we use these for conditionals)
CF, OF, ZF
One arg is register other one can be immediate value or another register

Branching
Jump commands
-Conditional — based on a condition’s truth value
-Unconditional

Looping
We can implement for- like loops
Loop label decrement ecx jump to label if ecx !=0
Loope or loopz : jmps if ECX’=0 and ZF=1
Loopne or loopnz : jmps if ECX !=0 and ZF=0

SHIFT AND TEST
Shifting
Logical shift shl&shr
Move all the bits to the left or right
Incoming bits are always zero
Can shift by N bits, set in CL or use a constany
Last bit shifted out set in the Carry Flag (CF)

Why we use shift?
Allows us to quickly
Multiply by 2
Divide by 2
Faster than doing a mul or div. Built into processor, mul/div need more complex instructions
Does not work for signed values

Arithmetic Shift
Quickly mul & div by 2
Ensure the sign bit is treated correctly
Shift right: keep the signed. MSB will not be shifted, new bits are copies of the sign bit. Negative number stays negative

Rotate Shifts
The bits that fall off go back to the other side
Carry Flag is still set
rol
ror
Rotate With Carry Use CF as part of the wheel
rcl
rcr

ENDIANESS
Why does it matter?
It matterwhen we use multi-byte data dw, dd
Big Endian used in network traffic headers

THE STACK

Main Memory
Data : static /global values
Code : instructions to be executed
Heap : dynamically allocated memory (ie malloc)
Stack : local vars and functions arg. Control program flow

Virtual Memory
Every application has it’s own unique address space sees as continuous block memory. Not cantinuous, fragmented.
Virtual memory — max is max addresses by 32 bit pointer
Address space- virtual memory available to each process
Organized into 6 sections:
Environment: environment vars and command line args
Stack: function args, ret values and automatic variables
Heap: free store, dynamic allocation
2x Data Sections: initialized / uninitialized static and global variables
Text Section : code

What is the Stack?
Stack is a region of memory pointed to by the register ESP
Push and Pop things onto and off of the stack
LIFO operations

What will we use it for?
Mostly for function calls
Stores local variables
Stores parameters
Allows for function context to be saved
Temporary space local variables, etc
Stack will get overwritten

Operations

Push x
Subtract 4 from esp. Moves that data to the location pointed at by esp
X is a register or immadiate value
Pop x
Adds 4 to esp. Stores result in the register x given in pop instruction
You need to coordinate your push/pop

Useful Library Function
dump — stack x, y, z
X is a integer label
Y is dwords below wbp
Z is dwords above ebp

Make sure that ESP and EBP are the same before the call
call function_Name
Function name is a label…
Push the address of the next instruction onto the stack
Move EIP to the new functions locations (address)
ret
Return from a function
Pop the return address off the stack
printf
Push args in reverse order
String format will be last

FUNCTIONS

Creating and Calling Functions
Call instruction calls a function
Pushes return address of the next instruction onto the stack
Jumps to the address of the function , EIP is set to the function address
Ret instruction will retun from a function
Pops off the address on the top of the stack
Sets EIP to that address

Functions and Arguments
Printf push the arguments before you use them
Use addresses to talk about the arguments
mov eax,[esp+4]; first argument
mov eax, [esp+8]; second argument
mov eax,[esap+0Ch]; third argument …

Extended Base Pointer (EBP)
Points to the base of a function
Save old value of EBP
Set EBP to point to ESP
[EBP + 8] pointS to the first parameter
[EBP + 4] pointS to the return address
[EBP] points to the old value of EBP
Benefit: Don’t have to adjust when things get pushed or saved

Local Variables
Used within a function
The value “disappears” when the function is finished
Local variables are i and sum
int foo(int n)
{
int sum=0;
for(int i=0; i<=n; i++)
{
sum+=i;
}
return sum;
}

In Assembly?
Use ebp to base the local variables
First local in ebp-4
Second local in ebp-8

Use sub esp, x; where x is the number of dwords (*4) you want to subtract
Move data from local to register to manipulate it
mov eax, [ebp-4]
in eac
mov[ebp-4], eax
Equal to x++

Things to remember Each stack operation is 10x slower than a register operation. Compilers will optimize to use registers for variables, if they are able.

ENTER AND LEAVE

Doing the prolog and epilog are tedious
Enter and leave instructions bridge that gap
Does epilog/prolog for EBP based stack
Enter 0,0
Enter a function does push ebp,
mov ebp, esp
Enter Extras
First argument : defines the number of bytes you want to use for local variables (in bytes). Enter 8,0 allocate space for 8 byte ( 2 dwords)
Second argument : not used unless by compilers. Always use 0

Leave
Leave a function does mov esp, ebp
pop ebp
Benefit: Easier to write

ARRAYS

A single variable that can hold multiple values contiguous block of memory, each element must be of the same type & size
You can calculate how to get to an index by knowing:
Address of first element
Number of bytes for each element
Index of the element

Creating Arrays
In the DATA section
Use db, dw, etc
Can also use TIMES
In BSS section
Uninitialized
Use resb, resw, etc
In the STACK
Subtract from ESP
In the HEAP (ie malloc)

Accessing Elements
There is no direct way -no[] as with C
Addresses must be computed

Element address = base address + index*size

LOAD EFFECTİVE ADDRESS (LEA)

We need to calculate the address of a local variable
Very common when calling subroutines/func
What does the following do? — mov eax, ebp+8
You can’t do that, so we try this:
mov eax, [ebp+8]
This moves the value

You can use lea
LEA: calculates the address, not the data at that address -lea eax, [ebp + 8]
This gives us the address — or offset

BUFFER OVERFLOWS

Buffer overrun or buffer overflow
Move more data into a buffer than the buffer was allocated for

Example: we have an 8 byte buffer, move 10 bytes of data into that buffer — what happens?
Overrun buffer’s boundary and overwrite adjacent memory locations
Action may be allowed, may change behavior of the program, may cause program to crash

Stack-Based Buffer Overflows
Stack Frame: when a function is called, structure called the stack frame is pushed onto the stack
EIP jumps to first instruction of the function
Stack frame contains: local variables, return address, arguments

--

--