MS Office File Formats — Advanced Malicious Document (Maldoc) Techniques

Authors: Kirk Sayre (@bigmacjpg), Harold Ogden (@haroldogden) and Carrie Roberts (@OrOneEqualsOne)

Welcome to the first blog post in a series of blog posts on Advanced Malicious Document (Maldoc) Techniques. This post will discuss basic file formats used by MS Office and some of their implications. The foundational information presented here will lay the ground work for the subsequent posts. For some other exciting content be sure to read the follow-on Evasive VBA and VBA Stomping articles.

The file format of an MS Office document can affect how the file is handled by Anti-virus solutions. The older Office 97–2003 file format uses the OLE (Object Linking and Embedding) file format. This is a complex binary format that is not easily manually manipulated. On the other hand, newer versions of Office save as a compressed archive (aka zip file) of smaller individual files that describe the document. The majority of the files are human readable XML files, but in some cases, such as macro enabled documents, the archive can also can OLE files. Of particular interest in a macro enabled document in the OLE file named vbaProject.bin. This file contains the details of the macro code to be executed.

The file extension for an Office 97–2003 Word document is always .doc, while the newer Office version uses a .docx for standard documents and .docm for files containing macros. The MS Office application will refuse to save a document containing a macro without the .docm extension for the new zip archive file format. To get around this, you can simply rename the file with another tool after saving from Office, or you can choose the older Office 97–2003 file format.

Manually renaming a .docm to .doc, using file explorer for example, does not change the “magic number” of the file. The magic number is a series of bytes at the beginning of a file that is intended to identify the file type. In the case of a .docm, the magic number is 504b0304 which indicates it is a zip archive. Renaming such a file to .doc does not change the magic number and the file continues to function normally in MS Office. The older MS Office file format for .doc files start with the magic number of d0cf11e (a cute hex representation of “doc file”). A file extension and magic number mismatch is unusual and may be an indication of malicious intent by the author.

As mentioned earlier, the .docm file format is a zipped archive of many other files. Of particular interest is the vbaProject.bin file found inside the “word” directory when unzipping an MS Word document as shown below.

The vbaProject.bin file contains the information about the macro code to be executed. This is the location and filename that MS Word will save the file by default, but this can be manually changed to subvert anti-virus detection or reverse engineering. The OpenDocument and Open XML Security paper by Philippe Lagadec describes this evasion technique along with others. The steps required to rename vbaProject.bin are as follows (from Philippe’s document).

1. rename “vbaProject.bin” to “no_macros_here.txt”

2. update relationships in “word/_rels/document.xml.rels”

3. in “[Content_Types].xml”, replace “bin” by “txt”

As a demonstration of anti-virus evasion, a standard attack macro was uploaded to Virus Total. One upload was unmodified and the second one had the vbaProject.bin file renamed to ver.txt. The virus total results are shown below.

VT Hashes: f77e0716c2994607e2de7ea41d04687c9b2533883753560ea4f4d1d29ba41bbd and 29a7cbdb40398f633377e8ad99d5ecced5e3b8f087ef2c0c019835bd3bc1f7b2

The virus detection rate was reduced by six but the behavior of the document did not change. Including a benign or misleading dummy vbaProject.bin file, in addition to the renamed file, could also distract an analyst from the true malicious payload.

Another interesting file in the archive is the vbaData.xml file which contains the macro names. As a defender, you can use this to your advantage to disable a macro that is configured to run automatically when the document is opened. See the VBA Stomping follow on blog post for details on how to do this.