Common malware attacks-PDF backbone analysis

Published in

OUSPG

5 min readJun 23, 2020

This story is part of a blog series aimed at providing walk troughs of different Dockerized tools used in digital forensics. The utilities presented are part of the CinCan project, aimed at aiding in computer forensics. Also, you can find other materials and blogs related to the project here.

Hidden malware in PDF files

Information exchange is everywhere nowadays and sending and receiving different types of documents, including PDF files, is a common practice. Because of this, the PIDIEF malware family likely infects more and more computers. The common mechanism of infection is a document, which exploits a vulnerability that enables JavaScript execution when it is opened. This also means that new malware or a backdoor can be installed and run remotely. Next, we will focus our attention on a recently detected sample PDF that contains hidden code.

Introducing the CinCan command

For this walk-through, we will use the CinCan, a command-line interface for running security analysis tools conveniently in Docker containers. By using containerized tools, the digital footprint on the system is low and the analysis can start right away without requiring further installs. Detailed install instructions are here and the only prerequisites are Docker and Python 3.6+.

First hands-on and analysis of the file

Now that we’re all set, we can start with the malware analysis. As you may have guessed, there is something suspicious about the sample PDF file. The question to answer is, what does it contain besides the normal data of a PDF document? In order to have a quick peek at what the file contains, we use the PDFiD which is very useful to triage PDF files before performing extensive analysis on them. This saves time and offers some clues about what kind of dangers the file might hide.

We will use the PDFiD tool from the CinCan repository with the command:

cincan run cincan/pdfid sample.pdf

As we can see, the file contains 2 /JS and 3 /JavaScript objects which raises red flags because they might contain malicious code.

JavaScript extraction and removing the code obfuscation

Let’s use peepdf go a bit deeper with the potentially malicious code in the /JS and /JavaScript objects:

cincan run cincan/peepdf sample.pdf -C "extract js > javascripts.txt"

By this, we launch CinCan’s peepdf tool and tell it to extract all JavaScript code into javascripts.txt:

The GIF uses a slightly different command, but you should use the one provided above!

That was fast! If we check the resulting file, we can observe that we have two objects present in the PDF file, both containing some interesting JavaScript code that doesn’t seem to make much sense at a first glance:

What is interesting in the second object is that variables are assigned “random” strings without any apparent reason and added up into variable A. It might mean nothing, but once we scroll a bit lower, we can see that the eval function takes string variable A as a parameter and executes it. Using this mechanism, the code can create its own functions dynamically without being detected by a simple parser.

We can use a cool trick to discover the actual executed code by changing the eval function into a printing one, e.g. into console.log(A), which, instead of executing the command, will just print the reassembled code. We will use CinCan’s busybox tool for substituting the string with this command:

cincan run busybox sed 's/eval(A)\;/console.log(A);/;' javascripts.txt > new_javascript.txt

This will output the modified code in a new file, new_javascript.txt. For the actual execution, we use Playcode here, an online tool for running JavaScript in the browser. We will take the full code from the second object and try to execute it:

Good news! We now see some JavaScript code but also an error after it. If we look at the error, it tells us that a function jrhbncrwi() is not defined, but if we check the printed JS code we will notice the actual function. We can deduce that the code is actually generating its own functions and executes them afterward for a better shadowing of functionalities.

Let’s copy the function jrhbncrwi() to the top of our code and execute again:

And now we see the full code that was executed:

And a little formatting shows us the actual malicious code:

By a quick glance, we can deduce that the program exploits a buffer overflow, in the util.printf() function, that forces the machine to execute the shellcode stored in the payload variable.

Going further with the results

As an extra, now that we have the shellcode, we can actually attempt to gain some insight into the actual program. We copy the value of the payload variable into a text file shellcode.txt and transform it into an executable by using CinCan’s Shellcode2Exe tool:

docker run --rm -v $(pwd):/samples cincan/shellcode2exe /samples/shellcode.txt

The output is a file shellcode.exe that looks like it is designed for Windows operating system.

Well done! Now we actually have the executable that is injected into the buffer overflow. If we want to push this a little further, we can attempt to decompile it in a quick manner by using CinCan’s Ghidra Decompiler:

cincan run cincan/ghidra-decompiler decompile shellcode.exe

After the initial prompt, we can also see some code that looks like very badly named C code.

This blog post was about detecting malicious code in a PDF file and, after the walk-through with CinCan’s PDFiD and peedpf tools, we already started to gain some understanding of how does the exploit work. In the following blog posts, we will explore different dockerized CinCan tools for a better analysis of the malware and improved reverse engineering of the shellcode.

Thank you!