AArch64 Instruction Set

RISC, ARM, Registers, Instruction, Stack Frame

Vince

Published in

vswe

5 min readFeb 8, 2021

RISC (Reduced Instruction Set Computer)

相較於 CISC (complex instruction set computer)
David Patterson 發現整個指令集，只有20%的指令常常被用到，但占整個程式 80%，因此主張硬體應該專心加速常用指令，複雜指令用常用指令組合。
RISC 的特性1：指令長度固定，優點是方便解碼，可以簡化設計，且 Pipeline 成效好；缺點是程式可能變大，占用記憶體較多，需要更多時間載入到處理器中。
RISC 的特性2：泛用的暫存器，所有暫存器可用於所有內容，以及編譯器設計的單純化，但暫存器中區分了整數和浮點。
RISC 的特性3：Load-Store架構，處理器不會對內存操作，所有計算都在暫存器完成。暫存器和內存的溝通，用其他指令完成。

ARM (Advanced RISC Machine)

Armv8-A 有兩種執行狀態 AArch64 和 AArch32
AArch64：執行 A64 指令集，使用 64bit 暫存器，但是指令的長度還是固定32bit。
AArch32：執行 A32/T32指令集，使用 32bit 暫存器，大部分架構使用32bit定長指令集。以前有 Thumb 執行狀態，使用 16bit。

Registers in AArch64

General-purpose integer registers (R0-R30, X0~X32, W0~W30)

The architecture provides 31 general purpose registers (R0..R30).
Each register can be used as a 64-bit X register (X0..X30), or as a 32-bit W register (W0..W30). These are two separate ways of looking at the same register.
When a W register is written, as seen in the example above, the top 32 bits of the 64-bit register are zeroed.

Source: this register diagram shows that `W0` is the bottom 32 bits of `X0,` and `W1` is the bottom 32 bits of `X1`

Floating-point/SIMD/NEON registers

There is a separate set of 32/16 registers used for floating point and vector operations. These registers are 128-bit, but like the general-purpose registers, can be accessed in several ways. Bx is 8 bits, Hx is 16 bits and so on to Qx which is 128 bits.
These registers can also be referred to as V registers. When the V form is used, the register is treated as being a vector. This means that it is treated as though it contains multiple independent values, instead of a single value.

The NEON unit can view the same register bank as:

sixteen 128-bit quadword registers, Q0-Q15
thirty-two 64-bit doubleword registers, D0-D31.

System registers (end with `_ELx, EL1, EL2, EL3`)

System registers are used to configure the processor and to control systems such as the MMU and exception handling.
System registers cannot be used directly by data processing or load/store instructions. Instead, the contents of a system register need to be read into an X register, operated on, and then written back to the system register.
There are two specialist instructions for accessing system registers: MRS and MSR .

MRS Xd, <system register> //reads the system register into Xd
MSR <system register>, Xn //write Xn to the system register.

Data Processing

Arithmetic and logic operations

MADD (Multiply-add): Multiply-Add multiplies two register values, adds a third register value, and writes the result to the destination register.

MLA (Multiply-add to accumulator)
邏輯左移 LSL (Logical shift left)，ASL 和 LSL 是等校的，都是補0。

算術右移 ASR (Arithmetic shift right)，用原先左側的符號位補齊，例如：10000000 如果算術右移7位，會變成 11111111。邏輯右移 LSR (Logical shift right)就是左側直接補0，變成00000001。
下面那個操作和有號數(二補數)相關，正數和0的二補數表示法就是該數字本身，負數則是將其對應正數按位元取反再加1，但是數字a（正負數皆可）的二補數操作即為 -a。
負數的二補數相當於和0的距離，和0的距離越近數字越大，例如：-1 等於 1111。這也是為什麼二補數的正負數可以直接做加減法的原因 (正數可以拉近負數和0的距離)。

但是為什麼要做 add r0, r0, r0, lsr #31 ，原因就是為了處理ASR 負數奇數的 case。

例如4-bit中
-6(1010) 做 ASR #1 等於 -3(1101)
看起來沒問題，但是
-5(1011) 用 ASR #1 等於 -3(1101)
無意間被進位了，所以需要(-5 + 1)
-4(1100) 用 ASR #1 等於 -2(1110)
Bingo!因為最左邊的bit如果是1被棄掉，等於被拉遠和0的距離
所以先 +1 把那個距離補回來(compiler傑出的一手)。

Floating point

FMOV Wd, Sn ; Single-precision to 32-bit
FMOV Sd, Wn ; 32-bit to single-precision
FJCVTZS Wd, Dn ; Floating-point Javascript Convert to Signed fixed-point rounding toward Zero = round-down (無條件捨去).

Loop

x86: [ebp-4] 是 sum, [ebp-8] 是 i, [ebp+8] 是 arg1 count，透過 cmp 設 flag 然後 jump 根據 flag 判斷條件。
arm: r0 是 sum, r3 是 i, r2 是 count，透過 cmp 和 branch 來判斷條件。

NEON Instructions

It extends the SIMD (Single Instruction Multiple Data) concept by defining groups of instructions operating on vectors stored in 64-bit D, doubleword, registers and 128-bit Q, quadword, vector registers.
Using vector instructions can produce a very large performance boost for some operations.

Source, NEON 指令集 VADD.I16 Q0, Q1, Q2. 同時平行執行8個 I16 的加法。

NEON supports 8-bit, 16-bit, 32-bit, and 64-bit signed and unsigned integers, and 32-bit single-precision floating point elements , and 8-bit and 16-bit polynomials.
VMULL.S16 Q0, D2, D3 平行把 D2 (4 個 16-bit 的浮點數) 和 D3 的相乘的結果，存在 128-bit 長的 (4 個 32-bit) Q0 中。

Addressing

Function calls

X86 Stack Frame

Stack 是由高位往低位長。因為 ESP 會一直指向 Top of stack，所以 function 一開始會先將舊的 EBP 存起來，然後把當下的 ESP 存入 EBP，整個 function 就這個 EBP 來 offset (EBP-4就是第一個local variable)，等到結束時再把舊的 EBP 存回去，返回透過 EAX。

高地址 stack bottom
  Function arg N           +4N(%ebp)
  Function arg 2           +12(%ebp)
  Function arg 1            +8(%ebp)
  Return Address(saved eip) +4(%ebp)  
  Old EBP Value               (%ebp)   
  Local Variable            -4(%ebp)   
                              (%esp)
低位址 stack toppush ebp ;     保存當前 ebp                            
mov ebp,esp ;  EBP設為當前堆棧指針                             
sub esp, n ;   預留n byte給函數Local變數(可能有padding)                        
如此一來EBP也形成一個框架指針(frame pointer)

ARM Stack Frame

ldr (load register): 從記憶體載入資料到 register
str (store register): 從 register 存回記憶體。
ARM 的 fp (Frame Point) 可以類比為 X86 的 EBP。function 開始前先把原本的 fp 存到 stack (#-4)，然後把 sp 存到 fp，然後產生一個 12 bytes 的 stack frame.
採用 B 系列指令來跳轉，其中 BL 指令將返回地址存在 LR register (Link Register)。
參數透過 R0~R7傳入，如果超過8個，就要從後面的參數開始 push stack，最後返回透過 R0~R1。

Register
R0: 參數1
R1: 參數2
  (略)
R7: 參數8
LR: return address高地址 stack bottom
  參數 11
  參數 10
  參數 9
低位址 stack top

x86 eip (Instruction Pointer) vs arm pc (Program Counter)

在大部分的處理器中，指令指標都是在提取程式指令後就被立即增加，內容存放處理器要提取(fetch, not execute)的下一道指令位址

從 disassembly 的程式看不出 x86 有把 saved eip 放入 stack，是因為 call 這個 instruction 自動會把 eip push onto stack。之後在 ret 的時候會 pop the most top value of stack 並存入 eip。
arm 的 disassembly 也看不出來，因為被藏在 bl 。bl (Branch With Link) 具體做了兩件事，第一件事把下一條指令的 address 寫到 lr，第二件事把 hello() 函式的 address 寫入 pc。
arm 可以利用 adr 指令將基於 pc 的相對偏移 address 讀到目標 register，如果用 . 等效讀取 pc 的值。

ADR Xd, .

AArch64 Instruction Set

RISC, ARM, Registers, Instruction, Stack Frame

RISC (Reduced Instruction Set Computer)

ARM (Advanced RISC Machine)

Registers in AArch64

General-purpose integer registers (R0-R30, X0~X32, W0~W30)

Floating-point/SIMD/NEON registers

System registers (end with `_ELx, EL1, EL2, EL3`)

Data Processing

Arithmetic and logic operations

Floating point

Loop

NEON Instructions

Addressing

Function calls

X86 Stack Frame

ARM Stack Frame

x86 eip (Instruction Pointer) vs arm pc (Program Counter)

Reference

Written by Vince

AArch64 Instruction Set

RISC, ARM, Registers, Instruction, Stack Frame

RISC (Reduced Instruction Set Computer)

ARM (Advanced RISC Machine)

Registers in AArch64

General-purpose integer registers (R0-R30, X0~X32, W0~W30)

Floating-point/SIMD/NEON registers

System registers (end with _ELx, EL1, EL2, EL3)

Data Processing

Arithmetic and logic operations

Floating point

Loop

NEON Instructions

Addressing

Function calls

X86 Stack Frame

ARM Stack Frame

x86 eip (Instruction Pointer) vs arm pc (Program Counter)

Reference

Written by Vince

System registers (end with `_ELx, EL1, EL2, EL3`)