What other security products WON’T tell you about malicious archives

Assemblyline Blog Entry #7

Kevin Hardy-Cooper, P.Eng.
10 min readNov 15, 2023
Photo by Leonard Laub on Unsplash

⚠️⚠️⚠️ CAUTION ⚠️⚠️⚠️

This document describes malware analysis in Assemblyline. Malware analysis must be performed in an isolated environment.

In the previous blog entry of the Assemblyline series “One last HackTheBox Business 2023 CTF Forensic Challenge”, my colleague gdesmar discussed how they used Assemblyline to complete the HackTheBox Business 2023 CTF!

In this post, we will be discussing how a malware campaign centred around archives seen in 2022 triggered large improvements to Assemblyline and CAPE Sandbox!

This article requires an understanding of how to interpret Assemblyline results, which can be gained by reading the documentation.

Context

In the summer of 2022, Microsoft announced that Office macros would be blocked by default, which caused malware authors to pivot to a new attack vector: archives.

https://news.sophos.com/en-us/2022/10/12/are-threat-actors-turning-to-archives-and-disk-images-as-macro-usage-dwindles/

In the autumn of 2022, suspicious archive files were spotted in Assemblyline, coming from all over the Government of Canada (GC). At this point in time, Assemblyline had difficulty analyzing archives, leaving GC networks vulnerable. Fortunately, the reverse engineering team alerted me to this lack of detection.

Attack Chain

The attack chain consisted of two stages:

Stage 1

  • A user receives an email with a PDF file attachment. This file sometimes contained a password.
  • The user opens the PDF file attachment which contains a link hidden in its “Annotations” which would automatically open.
  • This link pointed to a web page that contained malicious JavaScript.

Stage 2a

  • The JavaScript writes an archive file to disk (HTML smuggling), which occasionally was password-protected. This archive file contained at least a Windows shortcut file, an EXE file, and a DLL file.

Stage 2b

  • The user enters the password if required and opens the archive file.
  • Inside the archive, the user sees a Windows shortcut file that uses a fake icon to trick the user into thinking that the file is something else.
  • The user opens the Windows shortcut file.
  • The Windows shortcut file runs a command that starts the legitimate EXE file via a relative path.
  • The legitimate EXE loads a malicious DLL found in the archive file via DLL side-loading.
  • A DLL function is run that beacons out to the command-and-control server for the next set of instructions.
Assemblyline struggled in the attack chain

Assemblyline struggled at multiple points in the attack chain as illustrated by the diagram above, so we added some improvements!

Improvements!

Extract the PDF link

My cyber analyst colleague researched and identified that there was a link embedded in the “Annotations” section of the PDF. At the time this was a new technique, and we were not extracting the embedded URI when we saw samples that did this. To address this technique, my colleague improved Assemblyline’s PDF analysis service to utilize the “PikePDF” Python library to extract the URI.

An example of a sample from this campaign (see hashes at the bottom for more examples):

PDF improvement achieved!

Now we have a link, what do we do?

Download the web page

Back in the autumn of 2022, Assemblyline was very “file-oriented”, and could not automatically fetch files served at suspicious URI locations. This was a large area of improvement that my colleague addressed with a new Assemblyline service called URLDownloader which tries to download all suspicious links found in a submission.

For this sample, we found the suspicious URI, used the URLDownloader service to download the file served at that location, and the HTML file was resubmitted to Assemblyline for further analysis.

Diagram illustrating how a URI would be analyzed in Assemblyline

The above diagram is not accurate in terms of the file hosted at the URL and is meant for illustrative purposes only.

Write the archive file

This HTML file will now be analyzed by our JavaScript-oriented service, JsJaws. JsJaws uses a Node environment to emulate JavaScript code in a “virtual machine” via an open-source tool called MalwareJail. Let’s look at the de-obfuscated JavaScript code (thanks Synchrony!):

Malicious JavaScript code
  1. Just looking at this JavaScript, there is a large base64-encoded string text.
  2. This value is then sent to a custom method called b64toBlob which most likely converts this base64-encoded string into a Blob with the MIME type application/zip. Looks like an archive is going to be dropped!
  3. This Blob is then sent to the method msSaveOrOpenBlob along with the file name it is to be saved under.

I found that the Blob object and the msSaveOrOpenBlob method were not present in JsJaws’ Node environment at the time of the incident, so I updated the MalwareJail tool environment to extract the file after writing it to disk.

With these changes, JsJaws can emulate the JavaScript successfully to extract the archive to disk in the service, and then send that file through Assemblyline for further analysis!

I wrote a few signatures that would flag any JavaScript file that wrote an archive to disk using these methods so that Assemblyline would be able to score this behaviour in the future.

See hashes at the bottom for more examples of HTML smuggling!

Archives

These archive files were not the standard ZIP file format but rather ISOs, UDFs, and VHDs.

A search in Assemblyline for specific archive file types
The results of the above search

Identification

Occasionally, Assemblyline had issues correctly identifying some of the archives, so my colleague fixed this. His solution made it into another open-source project called SFlock via a pull request as well!

Correct identification for UDF files

Extraction

We are identifying the unusual archives correctly now but a new problem arose, aaargh! The Extract service was unable to extract some of these archives because the tool it uses, 7-Zip, uses the file extension in its determination of the file type and how to extract files from it. Since Assemblyline neglects the submitted file extension, then 7-Zip would always think UDF files are ISO (which they sort of are, but we’re not going to get into that) and it wouldn’t extract the right things. So my colleague came to the rescue and made a special case of ISO/UDF to rename the file for 7-Zip 😅

Password-Protection

Occasionally we saw an archive that was password-protected, and the password was found in one of the previous stages such as in the PDF or on the web page hosting the malicious JavaScript.

I worked with @gdesmar to extract useful text that was visible to the user and parse it in a way where potential passwords could be found. This list of potential passwords could then be sent to the appropriate Assemblyline service to be handled accordingly when analyzing a password-protected archive file.

The samples found at the bottom under “HTML Smuggling” will drop archive files similar to those mentioned in this “Archives” section.

DLL Side-Loading

Inside these archive files, which Assemblyline was now able to extract, were a series of files that consisted of at least a Windows shortcut file, a legitimate EXE file, and a malicious DLL file.

Let’s cherry-pick an archive file for demonstration:

File tree view of ISO file

By looking at the Characterize service analysis for the Windows shortcut file, we can see that there is a suspicious command line argument involved.

Suspicious command line arguments found in the Windows shortcut

The Characterize service was also able to extract a Batch file from this Windows shortcut:

Batch file extracted from Windows shortcut

If we look at the contents of that Batch file, we can see that when the LNK is run, the OneDriveUpdater.exe is started:

Batch file contents

Something interesting to note is that this ISO file contains two legitimate files.

Legitimate and illegitimate files in ISO

Looking at the LNK file’s command to start OneDriveUpdater.exe, it uses a relative path to access this EXE. Then the illegitimate version.dll is loaded via DLL side-loading. This decrypts the OneDriver.Update file and beacons out for the next payload.

This is a “great” anti-analysis technique for a variety of security products that extract items from archives and analyze each item in a silo. Since these files were dependent on each other, they would not execute correctly in isolation and no suspicious behaviour would be detected.

This is exactly what was happening in Assemblyline 😢, with each file in the archive being dynamically analyzed in a silo. We use CAPE Sandbox for our dynamic analysis.

Extracted files analyzed in silos
  1. The LNK would be extracted and sent to our dynamic analysis service CAPE which would fail to run because it requires the EXE in the same folder (as seen in the Batch file contents).
  2. The EXE is a legitimate file, so nothing of interest here.
  3. The legitimate DLL would do nothing of interest here.
  4. The illegitimate DLL would run in CAPE but does nothing of interest when we try to run all its export entries since it requires the encrypted file OneDrive.Update before it does anything interesting.
  5. We don’t analyze the encrypted file in CAPE since it cannot be accurately identified.

CAPE?!

At the time, CAPE only had support for ZIP files, and the ZIP execution module relied on a built-in Python library to perform the ZIP extraction, which was tailored to ZIP files and did not work for all archive files.

To solve this, I bootstrapped some Python code together that could execute 7-ZIP in the detonation environment such that it could extract any archive or password-protected archive it received.

I built on the limited logic that the ZIP execution module used to execute certain files. I looked at certain file extensions to determine what an “interesting file” to execute would be. The ZIP execution module would only run the first interesting file it found but given how the files in this campaign relied on each other so heavily, I thought it would be best if every interesting file was run so that all possible outcomes could be observed.

This feature of running every interesting file in an archive provided redundancy in case the file meant to be the entry point to the execution was not deemed interesting by the module.

Improved CAPE execution

With this new analysis module, we run the DLLs and EXE, which do nothing interesting on their own, and then we run the LNK, which starts the EXE and then loads the DLL export. This export then decrypts the encrypted file in the extracted folder, which allows it to beacon. This is good!

Since sharing is caring, I contributed the general archive support feature back to CAPE so that anyone who uses Assemblyline OR CAPE can benefit!

Initial pull request for generic archive support

With the use of this module, Assemblyline can send any archive file to CAPE for analysis, CAPE will extract the archive file to the disk of the detonation environment and execute all interesting files. The result of this for this campaign is that CAPE can execute the Windows shortcut file in the archive which was the starting point to run the rest of the attack chain successfully.

As the campaign has evolved, new requirements popped up to improve detection, and you can see the contributions I’ve made here.

Conclusion

Improvements to Assemblyline for the attack chain

This campaign provided a bunch of opportunities for the Canadian Centre for Cyber Security to react defensively by improving our tools such as:

  • Extracting links embedded in PDF Annotations
  • Creating a new service to download content hosted at suspicious URLs
  • Enhancing our Node.js emulation environment to extract archives written to disk
  • Add YARA rules to our identification engine for specific archive types
  • Pulled out possible passwords seen in HTML files to be used when extracting files from a password-protected archive
  • Added support for DLL side-loading in CAPE Sandbox via a generic archive module

Shout out to ❤️ our reverse-engineering team and cyber analysts ❤️ for keeping the Assemblyline team up to date on the latest campaign techniques and samples! This allows us to keep Assemblyline at the forefront of automated malware defence.

The next article is about a recent OneNote campaign!

Sample Hashes

“URL in Annotations” examples

15b8bd65d4ecbecb3e9ff4d09d1f9f6f4f74639dedbf8bad8a243ee0e0200450

a604bf25d4d538da392ffc1bb6457910c60bcaf6680bde68eff95be9cf294726

“HTML Smuggling” examples

1b61b16dd4b7f6203d742b47411ca679f1f5734ed01534a37a126263f84396c0

067c9df0d10e74dee52333ce0bace20de4229ff992039cc9f4d02ea10aae72b2

6256a2ed7d1d8855726840da3409d0dc611c663ae9d0cb299ae4176a469c10d8

“Archives” examples

784aa3d8bab6af41ee0f2cb6cb9d3b02a80b1d80d5c270cd8b0abebc6eb2c32a

63ade90920f3c771336089bd7fe255a76d81781c761347e8016d81eadd5ae687

427235e614298da9464b3308fdc81c1734b82f9f09bf6008881748e843d09ac8

121d73e43400fe5a870587386f67630392dd8051317fba9e3d3374f019269c12

99ca59fa203daa48793d933f40ccee3bdee5d55b1738f537334292001d17b9ab

“DLL Side-loading” example

1fc7b0e1054d54ce8f1de0cc95976081c7a85c7926c03172a3ddaa672690042c

All images unless otherwise noted are by the author.

--

--

Kevin Hardy-Cooper, P.Eng.

Dynamic Analysis Lead for Assemblyline @ Canadian Centre for Cyber Security🍁