Dissecting Malicious Office Docs: A Quick Guide

Muhammad Moiz Arshad
7 min readNov 30, 2021

--

Nowadays, MS Office suite has become a key role-player in almost every corporate environment. Other than the regular functionalities of Office suite, it also provides a way of automating different recurring and repetitive tasks, by Office Macros.

These Office macros paved a way to get rid of redundant data manipulation tasks, but also opened a door to let adversaries use the same feature to their advantage, by dropping malware, creating files, execution malicious code, etc. Surprisingly, it is one of the most frequently used technique or attack vector to gain initial access to computer systems, by social engineering humans.

What are macros?

Macros are chunks of code written in VBA (Visual Basic for Applications) language, that are embedded in Office document to automate certain areas. Anyone can add macros to these docs, with minimum hassle because they were designed to be beneficial for normal end-users, and that’s another factor what makes them a low-hanging fruit for attackers. Ranging from script-kiddies, amateur attackers to Advanced Persistent Threat (APT) actors, macros are all-time favorite for all of them.

Outline

The purpose of this article is to give you a basic understanding of how one should determine if an Office document is malicious or not, with an assumption that the reader has a basic knowledge of understanding computer languages.

Understanding Format

Microsoft has the following two standard file formats for different Office documents:

  • OLE: Object Linking and Embedding
    OLE is based upon Compound File Binary Format and is similar to file system, containing several named streams and storage. The main stream of any OLE file is ‘WordDocument’. OLE has been predominantly used in pre-2007 Office versions, although, backward compatibility has always been there. Its files use extensions, such as .doc, .xls, .ppt, etc.
  • OOXML: Office Open eXtensible Markup Language
    OOXML bundles several Office files, including OLE files, by zipping them together. This also means that OOXML still relies on OLE files behind the stage. It is a successor of OLE file format which has separate file extensions e.g., .docx, .xlsx, .pptx. These were implemented from Office versions 2007 and greater.

Pre-requisites

Before we move on, we need to have python preinstalled on our system. Also, some other tools are required to analyze docs that can be instantly installed using the following commands:

$ pip install oletools msoffcrypto-tool olefile$ curl "https://raw.githubusercontent.com/DidierStevens/DidierStevensSuite/master/oledump.py" -o oledump.py$ curl "https://raw.githubusercontent.com/DidierStevens/DidierStevensSuite/master/plugin_http_heuristics.py" -o plugin_http_heuristics.py$ curl "https://raw.githubusercontent.com/DidierStevens/DidierStevensSuite/master/msoffcrypto-crack.py" -o msoffcrypto-crack.py

Getting Started

To achieve our purpose, we will utilize a set of Oletools packages to extract and collect artifacts, and perform an analysis on a malware sample of Emotet initial excel doc (SHA1 hash: 0a08675863a971c5ca74724d633e729773914c96), which was observed to be used in one of their malspam campaigns. So let’s jump right in!

Oleid

To start with, we will look into Oleid. It is a utility of oletools that displays some specific characteristics of an Office document, such as file format, encryption status, macros existence and external relationships. This gives us a good starting point to decide whether we want to analyze further or not.

Simply type oleid doc_name.xls in terminal or cmd to run oleid to analyze the doc.

Figure 1: Oleid sample output

Oleid will let us know about the risk of doc using basic but effective heuristics, which will indicate if we need to analyze it further or mark it as safe at this point.

Macroraptor

Similar to Oleid, macroraptor or mraptor also utilizes different file heuristic techniques to determine the suspicion level of a file. It also tells us about the intentions of the underlying file in Flags column along with its overall verdict. We can use mraptor using the command mraptor doc_name.xls

Figure 2: Mraptor sample output

The flag ‘A’ is for AutoExec which means the document will try to trigger an event on enabling edit mode, ‘W’ is for writing files to system and ‘X’ is for execution of code. These three infamous flags are what makes an Office document even more suspicious.

Oledump

Since now we know that the file might have embedded VBA script, we will move towards breaking it down using oledump. To view the structure and streams of the doc, use the command oledump.py doc_name.xls

Figure 3: Oledump sample output

If you notice closely, apart from showing different streams, it also displayed two flags M for VBA macro code found and m for VBA macro with attributes only. What we are interested in are the ones with flag M, and we can select a specific stream to dump it using -s flag. So the command becomes oledump.py doc_name.xls -s A9 as A9 was the stream identifier.

Figure 4: Oledump sample output for specified stream

As we can see in the above screenshot, the output is not quite readable which is because these documents are saved after being compressed. We can easily get over with it using additional -v flag to decompress it before giving output.

oledump.py doc_name.xls -s A9 -v
Figure 5: Oledump decompressed sample output for specified stream

We can copy the raw VBA code into any text editor of your own choice, for further analysis. Although, we will not try to decode each and every section of this code, only the part with which we can present actionable intelligence, such as IoCs to block, etc.

Figure 6: Extracted VBA code from Excel file

With just a quick static analysis of the code, we can see some sections with obfuscated payloads on lines 52 and 54, and we can also see on line 55 that the obfuscation technique being used is a simple substitution method. We need to remove the characters “DaI” from the payload to deobfuscate it and can try if we would be able to understand the resulting payload.

Figure 7: Extracted payload from VBA macro after deobfuscation

Now, it is starting to make more sense, and the payload code is actually readable. As we can see the code is actually trying to download second stage payload from about 7 different remote sources and then attempting to execute them as processes. We can extract these domains and links in order to get them organization-wide blocked to thwart risk to some extend.

Figure 8: URL IoCs extracted from deobfuscated payload — second stage payload

Although we have covered the most part for the analysis of Emotet sample and determined it to be malicious, but we will continue to look at a few more utilities that can prove to be quite useful in other cases.

Olevba

We can think of olevba being a combination of mraptor and oledump, it can extract the VBA code as well as detect suspicious VBA keywords in the code. Run olevba by executing the command olevba doc_name.xls

Figure 9: Olevba sample output for raw VBA macro code
Figure 10: Olevba sample output for heuristic analysis of doc

The output of olevba further confirms the suspicious nature of the doc file.

Msoffcrypto

Msoffcrypto is another Office doc analysis tool that can be quite useful when doc files are encrypted and their password is not known. With msoffcrypto, one can forcibly try cracking doc passwords using a password dictionary. Use the command msoffcrypto-crack.py doc_name.xls for default password list while -p flag for custom dictionary.

What have we covered so far?

Following are the highlights of what we have covered in this blog so far:

  • Introduction and importance of macros.
  • Understanding of OLE and OOXML formats.
  • Usage of oletools and other utilities.
  • Basic static analysis of VBA code.
  • Extraction of IoCs for blocking them.

Cheat Sheet

  • Oleid
    oleid doc_name.xls
  • Mraptor
    mraptor doc_name.xls
  • Oledump
    oledump.py doc_name.xls
    oledump.py doc_name.xls -s stream_id
    oledump.py doc_name.xls -s stream_id -v
  • Olevba
    olevba doc_name.xls
  • Msoffcrypto-crack
    msoffcrypto-crack.py doc_name.xls

Credits

Lastly, I would like to bring forward some of the resources I got my motivation from to write this article, which might be helpful to go-through for more insight:

--

--