Can HackTheBox Business CTF 2023 Forensic Challenges beat Assemblyline? (2/3)

Assemblyline Blog Entry #5

9 min readOct 23, 2023

⚠️⚠️⚠️ CAUTION ⚠️⚠️⚠️

This document describes malware analysis in Assemblyline. Malware analysis must be performed in an isolated environment.

The following is the second part of a write-up detailing the solutions for forensic challenges that were part of the HackTheBox Business CTF 2023 competition. In the first part, Assemblyline helped us solve some basic challenges, but in this part, we’ll see some more complex ones that helped us improve Assemblyline for everybody’s threat hunting (and for the next CTF!).

Forensics #3: Hypercraft

The third forensic challenge starts the same way as the earlier two challenges, by uploading a ZIP file to Assemblyline. Keep in mind that if you have a VirusTotal API key setup in your Assemblyline instance, you can create a submission by sha256 hash and use VirusTotal as an external source without handling the file yourself.

I uploaded the file manually for this submission, so the file name of the submitted file is forensics_hypercraft.zip instead of the hash of the ZIP file. As usual, we wait until the services finish doing their job and then look for anything interesting in the file tree or the heuristics that are raised. This time, I saw a lot of non-interesting things. :(

The first issue in this Assemblyline analysis is that hypercraft.eml was identified as text/plain, instead of document/email, as all EML files should be. This prevented the file from being routed to the EmlParser service, which would have extracted as much information as possible from it. This is an Identify problem. Identify relies on three different methods to determine the file type of a file and none of them rely on the file extension, as we all know that file extensions are lies! The two main methods are File Magic, to which we add our own custom magic, and Mime Types. We have some added logic in certain cases, like using a specific GUID to further identify Office documents, but we are mainly using File Magic and Mime Types to get a result as fast as possible. In specific cases, like when we have a text/plain file, we will go ahead and use custom Yara rules to try to decide a more precise file type. More information and examples about Identify are available in our online documentation. It is very difficult to curate a good set of rules that can make the distinction between a PowerShell script or a Batch script calling PowerShell commands, or a plain JavaScript file versus an HTML page containing JavaScript tags. It is always a work in progress to improve our detection.

You can find all that identification-related information for each file in the File Detail section at the top of the Submission Details/Report page. Here’s our mis-identified hypercraft.eml file:

Back to our file tree, a service extracted a file named 4c5a7613b5_b64_decoded that was identified as code/html, from which another file named all_b64_7c11d76.txt was extracted, of type unknown. Both of those files were extracted by a service named FrankenStrings, which tries to go over every file and find interesting strings. Sometimes it gets lucky and serves as a safety net to extract different content. It searches for several types of encodings, from base64 to hex strings, and also looks for more than just strings, such as obfuscated/embedded executables. On the other hand, it can return results that are not understandable, like that unknown file which is just a blob of bytes.

If we look at the HTML page, our main interest is going to be turned toward the main JavaScript analyzer in Assemblyline, the JsJaws service. JsJaws did not produce a very verbose output when analyzing this file:

The use of charCodeAt() and atob() is often found in obfuscated scripts, and by experience, the click() function can be used to trigger a file download. Strange, JsJaws is usually rather good, but JavaScript is tricky and malware/CTF challenge authors are known to be trickier… Taking a look at the file itself in the File Viewer, we can see a few giant blobs of base64-encoded content, a few JavaScript functions, and some obfuscated strings:

Opening a node.js console, we can declare the variable wwzaligy and the two functions, pbmbiaan and afegnsku, that are reused throughout the obfuscation. Now we can start grabbing any call to pbmbiaan and paste it to our console to have a better idea of what’s going on.

Manually deobfuscated strings from the JavaScript file

With words like Blob, octet/stream, .zip, createObjectUrl and revokeObjectURL (which has been seen to trigger a file download), we should expect a ZIP file to be dropped by running this HTML file. Higher up in the code, we can see:

var qjyfawbg = pbmbiaan(document.getElementById('jzasjnpc').getAttribute(pbmbiaan('ExACBA==','wqve')),document.getElementById("begjwbvi").getAttribute(pbmbiaan('ExACBA==','wqve')));

Which can be cleaned to:

var qjyfawbg = pbmbiaan(
    document.getElementById('jzasjnpc').getAttribute(data),
    document.getElementById("begjwbvi").getAttribute(data)
);

We can see that qjyfawbg is then zgkhasgq’d, and it keeps going:

window.ojrzzuan = zgkhasgq(qjyfawbg, 512);

We have enough proof, we should get a ZIP file, so it is time to find out what went wrong in Assemblyline. It took a few tries and some JavaScript magic, but the specific methods that were not implemented in the JsJaws service were added. The ZIP file can now be extracted and sent to Assemblyline, where we can find another JavaScript file thanks to the Extract service:

With this Arodorian Hypercraft.pdf.js file, another issue was found in JsJaws, and because of this issue, JsJaws did not yield much. For the time being, I went back to the node.js console and started pasting the different sections of the JavaScript file. Here is a version of the JavaScript file, without most of the empty lines and useless comments:

After declaring the first four lines, which concatenate some strings, we can see that it’s wasting time and manipulating the value in uwetjyhi. The final value of ooqajrjz that will be executed in cyfgvptr starts with:

By writing this to a new file, and sending that new file to Assemblyline, we can keep going deeper down the rabbit hole. Here is the next file tree!

Yet another new file tree, with “MAX DEPTH REACHED” error

We can see that we have hit the maximum extraction depth in our instance of Assemblyline. This depth is there to prevent any runaway services and prevent enormous submissions from using too much processing time in your deployment. In this case, it looks like a module that keeps extracting a similar file, but not identical, since Assemblyline’s deduplication mechanism would have stopped this recursion. In the JsJaws output for the JavaScript file, we can see some interesting information:

This output explains the extracted boxjs_cmds.ps1 script in the file tree. That PowerShell script contains a one-liner:

IEx(NEW-oBJeCT SYsTeM.iO.COmpResSion.dEfLaTestReAm( [SySTem.IO.meMOrYStReAm] [convert]::FromBase64String(<DATA>),[SyStEM.IO.COMPreSSION.cOMPRessIONmodE]::DECOMPReSS)| FOrEach{NEW-oBJeCT iO.sTReAMREaDEr( $_,[SYsTeM.TExt.eNcodiNg]::AsCii ) } ).reaDTOEnd( )

Instead of doing it by hand, we can look at the contents of the first extracted deobfuscated.ps1 file from the Overpower service using the File Viewer:

All sub-deobfuscated.ps1 scripts have an added ##### DECOMPRESS CONTENT ##### section when compared to their parent file… and that is a bug we also fixed. Now let’s take a look at that crazy PowerShell script. The output of Overpower did not give us much information, but we can execute a PowerShell prompt, and start going over the script by declaring the first few Set-Item and Set-Variable commandlets, and the UYcxq function. Somehow the RcDAtCaJT function is not working when declared, but ¯\_(ツ)_/¯ :

Looking at all those encoded strings, we can see what they resolve to:

Manually deobfuscated strings from the PowerShell script

That first string is a URI that we should extract. That is the kind of information that we need to raise as an Indicator of Compromise, even if in this case, the .htb TLD is not a valid TLD, and usually more of a trick by HTB to tell us that we reached the end of the line. Toward the end, we can see the flag for the challenge: HTB{l0ts_of_l4Y3rs_iN_th4t_1}

But as happy as we are to have this flag, that is not the end of the challenge for us. What is that base64-encoded string we see just before the flag? If we base64-decode it, it gives us:

c = WScript.Arguments(0)
set s = CreateObject("WScript.Shell")
s.Run "powershell.exe -exec bypass " & c,0

Toward the end of the script, we can see the use of ScheduledTask elements, which we should be raising to the users:

ScheduledTask object creation and registration

By integrating fixes for the biggest issues mentioned thus far in this challenge, both the JsJaws and Overpower services were vastly improved. If we submit that first EML file again with these fixes in Assemblyline, we get a much better experience, which is what Assemblyline is designed to achieve:

Clean file tree with all interesting files

JsJaws is now able to extract that ZIP file directly as well as an associated Image, which is a classic in the Phishing circles:

From that ZIP file, we get the usual Arodorian Hypercraft.pdf.js file, but this time, it resolves the internal JavaScript deobfuscation and gives the boxjs_cmds.ps1 right away. From that boxjs_cmds.ps1, we get the deobfuscated.ps1 file, but since it was changed to not include the original script, only the new decoded content, the analysis does not keep going recursively. In the result sections, the biggest addition is to Overpower, which now raises the fact that there are ScheduledTask-related elements.

ScheduledTask highlights in Overpower’s results

We know from our earlier work that the 677C554705D1F045.vbs file is generated with the writeallbytes function. We should extract that file and add it to the File Tree, along with tagging that generated URI http://stolenplans.htb/r/alf/B17BD4381FB737A8. Regarding that Identify issue, where hypercraft.eml is identified as text/plain, we can look at the file itself and see the headers:

If we open RFC 822 at Appendix A, section 3.1, we can see the minimum required headers:

This custom challenge file is therefore not following the standard, and is the reason our main EML identification rule cannot identify it correctly.

Looking at our rules, we can see that in the past, we had a similar issue with non-standard EML files, as we have a second identification rule:

rule document_email_2 {
    meta:
        type = "document/email"
        score = 10
    strings:
        $ = /(^|\n)MIME-Version: /
        $ = /(^|\n)Content-Type: /
        $ = "This is a multipart message in MIME format."
    condition:
        all of them
}

This example is based on Microsoft’s sample MIME message, but we can modify it to handle the case where the courtesy string is not present. Using those three strings instead will allow us to identify Microsoft’s example, and the hypercraft.eml file:

strings:
        $ = /(^|\n)From: /
        $ = /(^|\n)MIME-Version: /
        $ = /(^|\n)Content-Type: multipart\/mixed;\s*boundary=/

Working through that challenge did give Assemblyline a few good punches, but it did not get beaten. Those were valuable lessons that are going to benefit everyone.

There’s always more to improve in Assemblyline, and if you have a good sample that you can share, we’re always interested in challenges. Contact us on Discord or our main Assemblyline issue tracker whenever you can!

In the meantime, head over to the next blog post for One last HackTheBox Business CTF 2023 Forensic Challenge (3/3)!

All images unless otherwise noted are by the author.

Can HackTheBox Business CTF 2023 Forensic Challenges beat Assemblyline? (2/3)

Assemblyline Blog Entry #5

⚠️⚠️⚠️ CAUTION ⚠️⚠️⚠️

Written by gdesmar