The Great OneNote Scramble of 2023
Assemblyline Blog Entry #8
⚠️⚠️⚠️ CAUTION ⚠️⚠️⚠️
This document describes malware analysis in Assemblyline. Malware analysis must be performed in an isolated environment.
In the previous blog post “What other security products WON’T tell you about malicious archives”, we discussed how a malware campaign centred around archives seen in 2022 triggered large improvements to Assemblyline and CAPE Sandbox.
This article will illustrate another, more recent, campaign in 2023 that focused on OneNote files and how the Assemblyline team adapted quickly to improve detection.
This article requires an understanding of how to interpret Assemblyline results, which can be gained by reading the documentation.
Context
There was a malware campaign in early 2023 that used OneNote files as the delivery vector for embedded scripts.
I saw a lot of these files come into Assemblyline from across the Government of Canada, and at the start of the campaign, the reverse engineering team alerted our team to the lack of detection that Assemblyline was providing. Go REVENG Go!
I have actually already written an article on the current-day analysis of a sample from this campaign, but this article will walk through how we got to that stage.
Attack Chain
The general attack flow consisted of:
- a OneNote file that contained an embedded obfuscated HTML file
- The obfuscated HTML file would write a batch file to disk and then run that batch file.
- The batch file would download the next payload (a DLL file) and run this DLL with a specific export function.
At each stage of this attack chain, Assemblyline’s analysis struggled 😿. The issues and their solutions are outlined below:
Issue #1
Attack Chain Step
Assemblyline Problem
Embedded files were not being extracted from OneNote files.
Assemblyline Solution
In the Extract service, we already had support for extracting embedded files out of OneNote files as of 2021. However, this capability only worked for some OneNote files.
For example, with 1fc8c811303e4fd19b3921fced69757d21133d928a51b7fbdc9b696d2e5a6954
the file tree in Assemblyline may have looked like this:
My cyber analyst colleague researched different open-source OneNote tools to compare and contrast their output, and based on this information we determined that a new service tailored for OneNote files was required to parse metadata and extract embedded files so my colleague developed one, the OneNote service!
Now with this new service that utilizes the pyOneNote
project, Assemblyline was able to reliably extract embedded files from OneNote files ✅.
Issue #2
Attack Chain Step
Assemblyline Problems
There were multiple problems at this step:
- HTML* and batch files were not being identified correctly due to obfuscation techniques or missing Yara rules.
- When the HTML* file was emulated in the JsJaws service, the batch file was not being written to disk.
- The HTML* file was not being sent to dynamic analysis.
*There were a variety of file types being used at this step, such as HTML, JavaScript, JScript and encoded Visual Basic. For the sake of simplicity, I’m going to use the term HTML but I could be referring to any of the previously mentioned file types.
Assemblyline Solutions
Multiple problems require multiple solutions!
Solution for Problem #1
To address the identification issue, you need to understand how identifying files works in Assemblyline. I investigated the file contents of HTML and batch files that were not being identified correctly to determine what high-confidence pieces of code could be included in the Yara rules.
Once I found code snippets I was confident with, I created a series of updates to the custom.yara
file that is used for identifying certain files ✅.
You can see that these updates were pushed over a month because the obfuscation techniques used in the campaign were evolving.
Solution for Problem #2
HTML files are sent to the JsJaws service to emulate any JavaScript. However, this campaign’s HTML files used a combination of embedded JavaScript and VBScript to hand off functions and variables to one another.
The ideal solution would be to develop or use a generic script emulator such as the one described in this talk. But alas, that is not open-source and time was of the essence, so I had to look at other options.
The combination of JavaScript and VBScript in an HTML file is meant for browser execution, which exists in a dynamic analysis environment. Dynamic analysis is expensive in terms of time and resources, so I developed a static technique using regular expressions to convert the simple VBScript functions seen early in the campaign to the equivalent JavaScript, and then rewritten inline for the file to be analyzed in JsJaws’s NodeJS emulation environment.
By doing so, I was able to emulate these files to the point where the batch file was written to disk and extracted ✅.
I recognized that my initial solution would not work for complex VBScript working with JavaScript scripts in an HTML page, so I would need the help of dynamic analysis.
Solution for Problem #3
The Assemblyline instance that processes files seen on the Government of Canada network sends files to the dynamic analysis only if the user submitting the files specifically requests it, or if there is a post-processing action configured. The post-processing action that resubmits files to dynamic analysis is a slight variation of the default action and sends files that score 500 or more in the initial round of analysis.
Therefore, I would need to assign a score of 500 to files with this feature within the JsJaws service. The handing off of variables containing string values where URLs are present from VBScript to JavaScript or vice versa and then writing these values to disk is very suspicious, so I created a method for detecting this within JsJaws ✅.
Later in the campaign, I saw the malware authors’ VBScript skills improving beyond my simple regular expression-based conversions, so this heuristic played a key role in detection within Assemblyline because it could be applied to HTML files regardless of the VBScript complexity.
As time progressed, my colleague identified that this heuristic was being raised on benign files in Assemblyline, producing false-positives. As a result I refactored the use of this heuristic such that a post-process action would be triggered once it was seen, rather than scoring the file with 500.
Issue #3
Attack Chain Step
Assemblyline Problems
There were a couple of problems at this step:
- We needed to flag the URL seen in the batch file as suspicious so that the next payload would be automatically downloaded via the URLDownloader service.
- The DLL payload contained more than one hundred export functions. We needed to run the DLL with a specific export function.
Assemblyline Solutions
Solution to Problem #1
The URL seen in the batch file was not being flagged in Assemblyline as suspicious. The Batch Deobfuscator service was flagging the URL as used for downloading an external file, but there were not enough indicators in the batch file to indicate that “hey this URL is suspicious”.
Since there were enough indicators to flag the HTML files as suspicious by JsJaws, and the URL was spotted in plaintext in those files, I had enough to raise the suspicious_url_found
signature which in turn would score the URL and then the URL would be downloaded by the URLDownloader service ✅.
Solution to Problem #2
The campaign relied on the DLL file being downloaded by the batch script and run with a specific export function. The export function commonly seen in this campaign was named “Wind” and was found at the very end of the list of export functions available in the DLL. This list of export functions was exceptionally large (over 100!), which caused problems with how DLL files were run in dynamic analysis.
At the time, when dynamic analysis services received a DLL file, they enumerated the list of export functions and would choose the first X number of exports to run during analysis (see max_dll_exports
in the CAPE service). In Assemblyline’s production deployments, this number was set to 50. When these campaign DLLs were seen in Assemblyline, dynamic services were not running the correct export because the export function list was much greater than 50, so no malicious behaviour was seen from the DLLs 😿
The reverse engineering team and I collaborated on a method that would choose the “most unique” exports from this list of export names using the Jaccard similarity technique in conjunction with choosing the first and last 10% of the export list to select what exports to run, which worked very well for the campaign and also applied to malware seen before ✅.
Additional Improvements
At this point, Assemblyline was detecting files associated with this campaign well, so kudos to everyone involved 👏.
Time went on and a new service was developed for Assemblyline by my colleague, known as the Ancestry service, which looks at the extraction paths for files in the submission file tree. In this campaign, the extraction path would have looked like this: OneNote → HTML/JavaScript → Batch. The purpose of the Ancestry service in terms of malware detection is that it can use a signature set that looks for suspicious extraction paths and flags them. Since this campaign was top-of-mind for me, I immediately added a signature that would flag the campaign extraction path as another layer of detection ✅.
Conclusion
This campaign provided a bunch of opportunities for the Canadian Centre for Cyber Security to react defensively by improving Assemblyline, whether that is with new services, heuristics, signatures, and identification rules.
Shout out to ❤️ our reverse-engineering team and cyber analysts ❤️ for keeping the Assemblyline team up to date on the latest campaign techniques and samples! This allows us to keep Assemblyline at the forefront of automated malware defence.
The next article is about lessons that Assemblyline learned from being compared with other open-source malware analysis tools!
Sample Hashes
document/office/onenote
1fc8c811303e4fd19b3921fced69757d21133d928a51b7fbdc9b696d2e5a6954
code/vbe
9d0270c93c27c24c5bc8e0ffb26f84a0fb8c55318738eb3d085441d727be6ec2
code/jscript
3b72f2d001a4190efb0a1d448e3c1573995d4d4bd15033d506c9474f86345e10
code/batch (default.bat)
af3f4e2e9e007ff1fad6edaebe67a58e495c7cf81abad5e18adc2f2267577f9f
code/batch (extracted_wscript.bat)
beca5f817a8befe6020633f113d54d16856ecaf594e95a2f527461f339e6e63d
All images unless otherwise noted are by the author.