Format String Vulnerabilities -Explained From the Bottom Up for 32-bit Linux (part 1)

n80fr1n60
7 min readAug 15, 2021

--

Introduction:

From time to time I revisit vulnerability classes and see if I can still explain them in laymen terms to beginners in this field who never heard about it or still struggle to understand on how, why these vulnerabilities work the way they do. This time, I try to explain format string exploits and of course doing it online, so I have a handy reference as otherwise it gets lost in the digital waste land on the hard drive.

I know there are many, that already explain this vulnerability. These (future) posts are mostly done for my own reference.

The structure is will be separated into an introductory part (this one) and then a part that focuses mostly on different attacks (what you can actually do when you have a program that has this vuln). After this is done, I will update it to 64-bit calling convention, and how this inflluences this vulnerability.

Background:

Format string vulnerabilities surfaced in the 2000. Like all the vulnerabilities it got more and more refined over the years, especially for writing to memory locations formulas have been derived that are readily available. Up until 2019 this kind of vulnerability kind of disappeared until Attacking SSL VPN — Part 1: PreAuth RCE on Palo Alto GlobalProtect, with Uber as Case Study! As a matter of fact it can be considered resurrected .

Functions that use format strings (C based) aka the format functions :

The Format string vulnerability is a bug predominantly found in the printf() family of functions . These functions convert and print data of different types to a string or file stream, formatted according to the format string. (more on what this mysterious format string is, later in the text). These functions take a variable amount of arguments, depending on how many format specifiers are in the format string iteslf.

Printf() functions:

What is a format string:

Lets get some definitions:

The Format String is the argument of the format function and is an ASCII Z string which contains text and format parameters [https://owasp.org/www-community/attacks/Format_string_attack].

format string refers to a control parameter used by a class of functions in the input/output libraries of C and many other programming languages. The string is written in a simple template language: characters are usually copied literally into the function’s output, but format specifiers, which start with a % character, indicate the location and method to translate a piece of data (such as a number) to characters.

[https://en.wikipedia.org/wiki/Printf_format_string]

But what does this exactly mean ?

The format string tells the program of how the text, that (in case of printf) will be printed should be formatted. Each format specifier is preceded by “%”, followed by a parameter. It indicates where in the string/stream, the data element should be inserted, and what data type should be converted and displayed.

The format string itself is made up of format specifiers and string literal data.

Lets see an example with printf:

printf (“The fox jumps over %d dogs \n”, 2)

So the string will be printed formatted as : The fox jumps over 2 dogs (%d is the int format specifier, and the data element 2 will replace the %d). It can be kind of seen as a conversion function, turning “primitive data types” (int, char, float …) into a string representation. A non exhaustive list of specifiers can be seen in the following image (https://web.ecs.syr.edu/~wedu/Teaching/cis643/LectureNotes_New/Format_String.pdf)

So what is the vulnerability?

If an attacker is able to provide the format string to a “format function” problems arise. This changes the intended behaviour of the “format function” because the supplied format specifiers are not expected and the matching arguments are missing ( stack layout for printf() will be discussed later), thus the values “converted” are based on whatever random data is on the stack at the time the attack happens. This can lead to nearly arbitrary read/writes, leaking of stack cookie(s) and such . Please already take a mental note of the fact that all printf() like functions do get the data element passed to the format specifier relative to the format string itself. Meaning: the first argument to be passed will be at one word (4-bytes on 32-bit ) higher in memory than the format string itself.

The Problem:

Lies in the fact that format functions can have any number of arguments. As we already know, the conversion that will take place is controlled by the format string. The function using the format string retrieves the data elements as requested by the format string from the stack.

So How does a format function like printf() work?

We will compile a small sample project, if you are on a 64-bit Linux, please install: sudo apt-get install gcc-multilib for 32bit compilation. If you don’t have it u get errors like: usr/include/stdio.h:27:10: fatal error: ‘bits/libc-header-start.h’ file not found.

For compiling we call: clang -m32 -O1 print.c -o print

Before anyone reading the followng says this is not a format string vulnerability, true, it is not, Because we are not doing sth like (taken from: (https://web.ecs.syr.edu/~wedu/Teaching/cis643/LectureNotes_New/Format_String.pdf) :

char user_input[100]; 
scanf("%s", user_input); /* getting a string rom user*/
printf(user_input); /* Vulnerable place as we directly use the user supplied input

its more like a missalignment of printf format specifiers and provided arguments for printf().

But for demonstration how this initially works, it makes no difference so we can use the test snippet.

Later when it comes to the setup where we want to read “arbitrary” memory locations we need to revisit on this.

For the time being, this will be our test program to find out how printf() works:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
int target;int vuln() {
int a = 0xba;
int b = 0xbe;
printf ("a has value %d, b has value %d, c is at address: %08x\n", a, b);
return 0;
}
int main () {
vuln();
return 0;
}

We load it into pwndbg and check the disassembly of the vuln() function:

───────────[ DISASM ]

0x8049170 <vuln> sub esp, 0x10
0x8049173 <vuln+3> push 0xbe
0x8049178 <vuln+8> push 0xba
0x804917d <vuln+13> push 0x804a008
0x8049182 <vuln+18> call printf@plt <printf@plt>

───────────[ STACK ]

At the time of call from the printf() @ 0x8049182 the stack looks like (see above)

Just let the program run to completion:

"a has value 186, b has value 190, c is at address: 080491bd"

We can infer from this, that the data elements that will be “converted” start right above the format string. Which makes sense, as we have the calling convention in place, pushing the parameters to the function from right to left onto the stack, the format string itself being at the lowest stack address. (Note: Top of stack is at lower memory address)

So now we have a rough understanding of format string. For later already realize we have an indirection at the top of the stack, aka the

format string ( 0xffffd070 — ▸ 0x804a008 ).

Some more theory on format specifiers

These format specifier in the table below are by no means exhaustive, just a representation of the most common ones (https://web.ecs.syr.edu/~wedu/Teaching/cis643/LectureNotes_New/Format_String.pdf)

As we see some of these are passed as “Value” and some are passed as “Reference”… What’s the deal, as a refresher:

Lets explain this theoretically and then with a small sample:

If you use a format specifier as “%p” (prints a pointer, so the “conversion” will not try to “resolve” it), it just requires a value, so whatever is on the stack will be printed, same as with “0x%08x” — 8digit hexadecimal formatting with padding (as seen in the example above: 0x080491bd ) .

If you feed a “%s” it will follow the indirection, and try to print what I finds at the location it just poped of the stack… in our case, it would start printing what it finds at the location 0x080491bd, until it hits a “\00” aka zero terminator. But most likely the program is going to crash (Segfault), as the indirection induced by the pointer dereference, the memory it tries to access is not mapped or is kernel space (aka lacking access rights).

Example with our sample as above but we replace the printf() :

printf ("a has value %d, b has value %d, c is --> %s <--\n", a, b);

───────────[ DISASM ]

0x8049170 <vuln> sub esp, 0x10
0x8049173 <vuln+3> push 0xbe
0x8049178 <vuln+8> push 0xba
0x804917d <vuln+13> push 0x804a008
0x8049182 <vuln+18> call printf@plt <printf@plt>

───────────[ STACK ]

At the time of call from the printf() @ 0x8049182 the stack looks (see above)

Lets have a look what what we find when we look at 0x80491bd and take everything upto the first \x00, we issue the following command in pwndbg:

x/64bx 0x80491bd

and get this:

Now the same as characters (note: GDB displays characters with the octal escape ‘\nnn’ outside the 7-bit ASCII range)

x/64c 0x80491bd

Now the same with interpret it as string:

x/s 0x80491bd
0x80491bd <__libc_csu_init+29>: "\215 \235 \b \377 \377\377\215\205\004\377\377\377)\303\301\373\002t%",<incomplete sequence \366\215\266>

0xB6 is octal “\266”. The string stops when hitting the first “\00” — zero terminator in ANSI C string functions.

We modify our printf() again to look like:

printf ("a has value %d, b has value %d, c is --> %s %s %s <--\n", a, b);

───────────[ STACK ]

a has value 186, b has value 190, c is: giberrish — which cannot be printed here :-/ On the system tested it just printed “(null)”, on other systems it might segfault. According to the C standard the behavior is undefined …

Conclusion:

As this is already a pretty long article, that’s it for part 1. In part 2 we dive into the actual use cases like:

  • Map/view the stack (e.g leak stack cookie)
  • View memory at any mapped location (where we have access to)
  • Overwriting nearly arbitrary memory

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

References:

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

--

--