A Summary of x86 String Instructions

  1. The logic behind x86 string instructions.
  2. All the information from (1) squeezed into a table.
  3. A real-life example.

The Logic

The Prefix + Instruction Combo

First, let’s make the distinction between string instructions (MOVS, LODS, STOS, CMPS, SCAS) and repetition prefixes (REP, REPE, REPNE, REPZ, REPNZ).

Possible combinations of repetition prefixes (dark blue) and string instructions (light blue).

Termination Conditions

  • REP: repeat until ECX equals 0.
  • REPE, REPZ: repeat until ECX equals 0 or as long as the zero flag is set. The two prefixes mean exactly the same.
  • REPNE, REPNZ: repeat until ECX equals 0 or as long as the zero flag is unset. The two prefixes mean exactly the same.

String Instructions

The instruction’s first three letters tell us what it does. The “S” in all instructions stands for — how surprising — “String”. Each of these instructions is followed by a letter representing the size to operate on: ‘B’ for byte, ‘W’ for word (2 bytes) and ‘D’ for double-word (4 bytes).

  • MOV moves data from the source string to the destination string.
  • CMP compares data between the source and destination strings (in x86, comparison is basically subtraction which affects the EFLAGS register).
Strings pointed to by the ESI, EDI registers.
  • LOD loads data from the string pointed to by ESI into EAX¹.
  • STO stores data from EAX¹ into the string pointed to by EDI.
  • SCA scans the data in the string pointed to by EDI and compares it to EAX¹ (again, along with affecting EFLAGS).
REPE CMPSB for Trump’s Rescue.

Cheat Sheet

Cheat sheet for x86 Assembly’s string instructions.

A Real-Life Example

Lately, we started doing CTFs at work (Trusteer, IBM Security). I stumbled upon a crack-me challenge from reversing.kr which contained the following function. Try to think about what this function is while we reverse engineer it together.

  • The string pointed to by EDI is scanned and each character is compared to zero, held by AL.
  • This happens until ECX equals zero or until a null-terminator is scanned.
  1. Each character pointed to by EDI is compared to the corresponding one pointed to by ESI.
  2. This happens until ECX equals zero (namely, the destination string has been fully consumed) or until the zero flag is unset (namely, until a difference between the strings is detected).
  • If they are equal — the function returns zero (ECX XORed with itself).
  • If the character in [ESI-1] has a higher ASCII value than the one in [EDI-1] — the function returns 0xffffffff, or -1. This happens when the source string is lexicographically bigger than the destination string.
  • Otherwise, The function returns not 0xfffffffe, which is 1.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ophir Harpaz

Ophir Harpaz

@ophirharpaz on Twitter. Security researcher at Guardicore. Reverse engineering enthusiast. Author of https://begin.re.