Life as a threat investigator

Soon Chai
CSIT tech blog
Published in
7 min readJun 16, 2022

--

Introduction

The cyber threat landscape has grown in complexity over the past few years. This is evident from the wide variety of threats (e.g. phishing, hacking mobile phones, compromising websites, conducting ransomware attacks on critical infrastructure, etc.) that we, as threat investigators, encountered on a daily basis in CSIT.

We focus our investigations and discovery work on threats that could impact Singapore. We seek to expand leads on threat actors to better improve our ability to detect their malicious activities. Our investigation usually ends with the derivation of new IOCs (indicators of compromise), Yara rules, Snort rules and post-processing scripts. At times, especially when we uncover new and interesting techniques, we will put together our analysis into a technical report. These findings are then shared with other government agencies for them to perform threat hunting in their own networks.

In this article, I’ll be sharing our investigation into an interesting phishing malware discovered by Joe Slowik, a researcher from DomainTools. Before proceeding any further, I recommend reading his article “COVID-19 Phishing With a Side of Cobalt Strike” and the documentation on Yara, an open source tool used by malware researchers to identify and classify malware samples.

What is phishing and why does it concern me?

Phishing is a type of social engineering attack whereby the attacker impersonates a trusted entity to trick victims into clicking malicious URL links, opening emails or text messages which can lead to the installation of malware on the victim machines.

In the article from Joe Slowik, the attacker was using the theme on COVID-19 vaccines to entice people to open up the malicious Excel document attached in the email. You might be thinking to yourself that you wouldn’t have fallen for this and the bad guys won’t be interested in you anyway.

Indeed, in scams, the threat actors don’t really care who you are. They will just send out thousands of emails with malicious links or attachments and still earn a sizeable amount of money even if only a small percentage of recipients fall for this. These threat actors just need to keep up with a wide range of trending themes and reuse their malware to make money.

However, in some cases, the attackers might be interested in the organization you work for. They will focus their efforts on individuals like you in a targeted manner which is also known as spear phishing. All they need is one unsuspecting employee to gain access to the intellectual property they are after.

Investigating the phishing document

Joe Slowik and his team did a great job to explain how the malware was stored in the malicious Excel document and what happened when the Excel document is opened by the victim. The researchers also managed to find a few more samples in VirusTotal using unique strings (“findstr” and “TVNDRgAAAA”) that are present in the Excel document and uncover another C2 domain that belongs to the same threat actor.

We decided to conduct our own investigation and see whether we can find more IOCs from the loader FSPMAPI.dll and the payload wasmedic.NCEx.nu.etl that was mentioned in the article. As the researchers did not provide the hashes (signatures that uniquely identify binaries) of these 2 binaries, we had to extract them from the Excel document ourselves. It is interesting to note that the loader FSPMAPI.dll was not detected by any of the antivirus at the point of upload as shown in VirusTotal.

FSPMAPI.dll

The threat actor used a very unique technique to obfuscate the strings inside the DLL as shown in the pictures below. String obfuscation is a very common technique employed by the malware writers to avoid detection and make it hard for analysis. You can refer to techniques T1027 and T1140 of the MITRE ATT&CK framework for more information.

Each of the obfuscated strings is stored as a series of 16 bytes numbers within the binary with their order jumbled up. The ciphertext contains the encoded keystream as well as the encoded string.

Running the above code from 0x10003816 to 0x10003862 will get you the ciphertext of “kernel32.dll” as shown below. Each byte of the ciphertext is stored as a DWORD (4 bytes little endian byte order).

An illustration of how the string kernel32.dll is decoded:

The code that de-obfuscates the encoded strings is located in the function sub_10001F90:

This is the Yara rule that we developed for this de-obfuscation function:

rule Loader
{
meta:
description="FSPMAPI.dll string obfuscation function"
strings:
// 8b 45 08 mov eax, [ebp+a_pBuffer]
// 8d 4c 24 14 lea ecx, [esp+30h+l_pKeyStream]
// 8a 04 98 mov al, [eax+ebx*4]
// 2a c3 sub al, bl
// fe c0 inc al
// 0f b6 c0 movzx eax, al
// 50 push eax
// e8 ?? ?? ?? ?? call XXXX
$loop1 = { 8b 45 08 8d 4c 24 14 8a 04 98 2a c3 fe c0 0f b6 c0 50 e8 ?? ?? ?? ?? }

// 83 7c 24 28 10 cmp [esp+30h+var_8], 10h
// 8d 4c 24 14 lea ecx, [esp+30h+l_pKeyStream]
// 8b c7 mov eax, edi
// 8b 5d 08 mov ebx, [ebp+a_pBuffer]
// 0f 43 4c 24 14 cmovnb ecx, [esp+30h+l_pKeyStream]
// 99 cdq
// f7 7c 24 0c idiv [esp+30h+l_iKeyStreamLength]
// 8a 04 b3 mov al, [ebx+esi*4]
// 2a 04 0a sub al, [edx+ecx]
// 8b 4c 24 10 mov ecx, [esp+30h+l_pPlaintext]
// 0f b6 c0 movzx eax, al
// 50 push eax
// e8 ?? ?? ?? ?? call XXXX
$loop2 = { 83 7c 24 28 10 8d 4c 24 14 8b c7 8b 5d 08 0f 43 4c 24 14 99 f7 7c 24 0c 8a 04 b3 2a 04 0a 8b 4c 24 10 0f b6 c0 50 e8 ?? ?? ?? ?? }
condition:
all of them
}

wasmedic.NCEx.nu.etl

Besides doing string obfuscation, another interesting thing about this malware is the way the payload Cobalt Strike Beacon was packed into wasmedic.NCEx.nu.etl to help it bypass antivirus detection. Cobalt Strike is a very popular red team platform which gives security pen-testers access to a large variety of attack capabilities. Sadly, it is widely misused by threat actors to gain initial access to their victims’ networks.

Just like the obfuscated strings, the ciphertext also starts with the encoded keystream which is immediately followed by the encoded data. However, this is where the similarities end. Each byte of the ciphertext is no longer stored as a DWORD and the encoded data is broken up into 3 blocks as shown in the picture below.

The code that de-obfuscates the encoded strings is located in the function sub_10002CE0 of the loader FSPMAPI.dll. Instead of showing the disassembly of the function, I have written a Python script that can de-obfuscate the wasmedic.NCEx.nu.etl file.

def tobyte8(val, nbits):
return (val + (1 << nbits)) % (1 << nbits)
def decrypt_file(ciphertextfile, plaintextfile):
fh = open(ciphertextfile, 'rb')
data = fh.read()
fh.close()

keystream_ciphertext_len = data[0]

print("Encrypted keystream length = ", hex(keystream_ciphertext_len))

keystream_ciphertext = data[1:1+keystream_ciphertext_len]

keystream = bytearray()

for i in range(keystream_ciphertext_len):
if i % 2 == 0:
keystream.append(keystream_ciphertext[i]-0xC)

print("Keystream = ", keystream)
keystreamMod = len(keystream)
print("keystream Mod = ", keystreamMod)

block_length = int((len(data)-1-keystream_ciphertext_len)/3)
print("Block length = ", block_length)

block1_offset_start = 1 + keystream_ciphertext_len
block2_offset_start = block1_offset_start + block_length
block3_offset_start = block2_offset_start + block_length

block1 = bytearray(data[block1_offset_start:block1_offset_start+block_length])
block2 = bytearray(data[block2_offset_start:block2_offset_start+block_length])
block3 = bytearray(data[block3_offset_start:block3_offset_start+block_length])

print("Decrypting block 1 ...")
block1.reverse()

for i in range(block_length):
block1[i] = tobyte8(block1[i]-keystream[i%keystreamMod],8)

print("Decrypting block 2 ...")
for i in range(block_length):
block2[i] = tobyte8(block2[i]-keystream[i%keystreamMod],8)

print("Decrypting block 3 ...")
block3.reverse()

for i in range(block_length):
block3[i] = tobyte8(block3[i]-keystream[i%keystreamMod],8)

fh = open(plaintextfile, 'wb')
fh.write(block1)
fh.write(block2)
fh.write(block3)
fh.close()

Wrapping it up…

With the Yara rules that we developed, we can run them through our malware repositories and VirusTotal to see if we can find anything. We could also try to decode files with the Python script. If it can decode something, it must mean that we are looking at variants of the same malware!

Conclusion

As a threat investigator, we will triage malware samples quickly to decide whether it is a known or unknown threat. The samples are scanned by antivirus and detonated in sandboxes to extract network and host-based IOCs. Such efforts are required to effectively detect and block potentially malicious activities in time before any damage is done.

We only dive deeper if the malware sample is unknown or involved in high profile incidents. We will analyze the malicious code to look for custom algorithms and implementations. Why are we doing this? We believe malware writers have a tendency to reuse libraries in their malware especially those codes that can help them bypass security products. Once the unique codes are located in the binaries, we will create Yara signatures on the unique byte patterns to help us discover new malware samples possibly related to the same threat actor.

Malware analysis is just one of the many things that we are doing at CSIT to generate additional leads. We are also very big on automation and trying to use machine learning to put ourselves out of work! I will leave it to my colleagues to share more in future posts.

Interested to work at CSIT? We are hiring. Check out our Careers page!

References

  1. Joe Slowik, DomainTools, “COVID-19 Phishing With a Side of Cobalt Strike
  2. Alyssa Rahman, Mandiant, “Defining Cobalt Strike Components So You Can BEA-CONfident in Your Analysis
  3. Yara, “Welcome to Yara’s documentation
  4. MITRE ATT&CK, “T1027: Obfuscated Files or Information
  5. MITRE ATT&CK, “T1140: Deobfuscate/Decode Files or Information

--

--