Emotet: Dangerous Malware Keeps on Evolving
Research into the latest Emotet variant by Symantec’s Threat Engineering Team has revealed details about which compression algorithm the gang behind Emotet has customized and is using in its code.
Authors: Nguyen Hoang Giang, Mingwei Zhang
Emotet is one of the most dangerous malware threats active today. Emotet (Trojan.Emotet) began life as a banking Trojan but evolved several years ago to act as a malware loader for other threats — Emotet infects a machine and then downloads another threat e.g. the TrickBot information stealer, onto the infected system. Emotet is now one of the biggest threat distributors out there, renting its infrastructure out to all sorts of other threats, including ransomware, information stealers, and cryptocurrency miners.
We wrote a detailed blog about Emotet’s evolution in 2018, which gives you more background on the threat and its development.
Recently our threat engineering team noticed some updates to Emotet (Version 5, 20200201), and performed some analysis on the malware sample to see exactly what was going on.
The first thing we noticed is that Emotet has updated its techniques for obfuscating its flow of code. This anti-analysis technique makes it more difficult to analyze and track modifications between variant binaries. The second thing is a change in the communication protocol between the botnet and its command and control (C&C) servers. A detailed technical analysis of these changes follows.
Control Flow Flattening
This Emotet binary (unpacked) is using an obfuscation technique called Control Flow Flattening, which works as follows:
- Each basic block is assigned a number.
- The obfuscator introduces a block number variable, indicating which block should execute.
- Each block, instead of transferring control to a successor with a branch instruction, as usual, updates the block number variable to its chosen successor.
- The ordinary control flow is replaced with a switch statement over the block number variable, wrapped inside of a loop.
Encrypted Strings and Resources
All strings and other resources (such as RSA public key) are encrypted and only decrypt at runtime. The decryption algorithm is described in Figure 2.
We can use IDAPython to create a script to decrypt the data:
key1 = idc.get_wide_dword(ea)
key2 = idc.get_wide_dword(ea + 4)
size = key1 ^ key2
data_size = (size + 3) & 0xFFFFFFFC
ea += 8
s = ''
for i in range(data_size/4):
dec_dword = idc.get_wide_dword(ea + i * 4) ^ key1
s += struct.pack('<I', dec_dword)
Dynamic API Resolve
In this version, Emotet resolves API(s) by looking up the hashes of the API name and DLL name once it needs-to-use instead of loading them all at one time as previous versions. We observed that this behavior is similar to the code of Dridex or the Bitpaymer/ DopplePaymer ransomware:
The customized hash function is calculated as follows:
i = 0
for c in api_name:
i = (i << 16) + (i << 6) + ord(c) – i
# xor dword (0x165308FE) varies between binaries and different
# between functions resolving api name hash and module name hash
return (i & 0xFFFFFFFF) ^ 0x165308FE
Upon first infection, the Emotet sample runs through two stages. During the first stage, the sample runs as a first instance, does some setup and checks the victim system, it then executes the second instance. The second instance will run in the second stage, where it communicates with embedded C&C server addresses in its binary.
First Stage — Dropper Instance
When it first runs, Emotet tries to decrypt information from additional DLLs that it requires to load. Among the DLLs it uses are:
Then, Emotet gets the volume serial number of the Windows partition. This volume serial number is used to create a series of mutex and event handles, with object names as follows (%X is the format of the volume serial number):
- Global\\I%X — MutexI
- Global\\M%X — MutexM
- Global\\E%X — EventE
A pair, EventE and MutexM, is created for synchronization between the first instance and second instance (by using API SignalObjectAndWait), to ensure that the second instance is only able to connect to the C&C servers once the first instance is exited.
Check Privilege and Delete Old Variant
Emotet checks its running privilege by calling API OpenSCManagerW with parameter SC_MANAGER_ALL_ACCESS, if this API call is successful, then the sample is considered to be running with high privilege.
Then the sample decrypts a list of words as below and uses the volume serial number to calculate and select two words from that list. It then combines them to get the filenames of old Emotet binaries that were dropped by Emotet’s previous version. Depending on whether it is running with high privilege or not, the old binary with that filename will be deleted from CISIDL_SYSTEMX86 or CSIDL_LOCAL_APPDATA.
If the sample is running with high privilege, it checks if its running filename contains a number, if so, it will try to delete any other files with the same filename. It looks like this latest sample of Emotet is trying to remove old traits of the previous version(s).
Check Setting From Registry
For its next step, Emotet checks the registry value name (is volume serial number) in:
- HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Explorer — If Emotet is running with high privilege
- HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer — If Emotet is running with low privilege
The data of this registry value name is the encrypted filename of the Emotet binary that will be dropped by the current version. It is XOR-encrypted and the key is the volume serial number. If this registry value is not available, it means that the Emotet sample is running in its first stage and it will set this value for use in the second instance.
Filename of New Dropped File
To drop the Emotet binary for the second stage, Emotet needs to get a filename. This sample will scan in CSIDL_SYSTEM, only filenames with the extensions .dll and .exe are selected. This is similar to the Bitpaymer ransomware, which also clones an executable file in the system directory to hide its binary in the ADS stream of the cloned file. This new filename is saved to the registry as previously described.
Emotet then makes a path to drop itself to. If it is running with high privilege, it drops the binary to CSIDL_SYSTEMX86, otherwise, it drops it to CSIDL_LOCAL_APPDATA (using API SHFileOperation with the parameter FO_MOVE).
The path of the dropped file is:
- “(CSIDL_SYSTEMX86|CSIDL_LOCAL_APPDATA)\\%s\\%s.exe” % (new_filename, new_filename)
When it is running with high privilege, Emotet gains persistence by creating a service for the dropped file:
- “CSIDL_SYSTEMX86\\%s\\%s.exe” % (new_filename, new_filename)
Service name: new_filename
Service display name: new_filename
Otherwise, Emotet gains persistence in the registry by setting a registry value (only after it has its first connected to its C&C servers):
Value name: new_filename
Value data: CSIDL_LOCAL_APPDATA\\%s\\%s.exe % (new_filename, new_filename)
Finally, Emotet launches the second instance by calling API CreateProcessW to the dropped file.
Second Stage — Bot Instance
The second instance goes through the same steps as the first instance until it is able to get the saved data of the filename from the registry. Then it performs some checks before communicating with the C&C servers:
- Checks current filename is the same as the filename saved in the registry. If not, it runs similar to the first stage to launch another instance.
- Check if its parent process is services.exe, meaning it is running as a service. If it is, it runs similar to the first stage to launch another instance.
The communication protocol is changed in this version. We will describe this in detail below.
Get List C&C IP Port / RSA Public Key and Generate AES Key
In this version, the IP addresses and ports of the C&C servers continue storing in binary as 8-byte blocks.
Emotet retrieves the RSA public key from encrypted data. RSA public key is in PEM format:
-----BEGIN PUBLIC KEY-----
-----END PUBLIC KEY-----
Then the sample will generate an AES-128-CBC session key handle and an SHA-1 hash key handle. The RSA public key, AES-128-CBC Key, and SHA-1 hash are combined to secure the connection between Emotet samples and the C&C servers.
Setup Post Request
Next, the Emotet sample starts building a plaintext packet for the request. This is basic information to generate packets after that. The plaintext packet has the following format:
uint8_t victim_id[victim_id_size]; // hostname + hex volume serial number
uint32_t bot_id; // 0x1343B09 or 20200201 in decimal – it looks like the date format of 2020/02/01
uint32_t unknown_id; // 0x7D0 or 2000 in decimal
Emotet then compresses this plaintext packet by using an unknown compression algorithm. After digging deeply into this compression algorithm, we can point out which compression library the actors behind Emotet are using and what they customized in that library. Let’s keep going and look at how the compressed data is repacked to a command packet.
Client Command Packet
The command package is composed of the compressed data of the plaintext packet:
uint32_t cmd_id; // 0x01 for register command
Finally, Emotet performs encryption on the command packet (by AES-128-CBC) to generate a final packet, which is posted to the C&C servers through HTTP POST. The format of the final packet is as follows:
uint8_t enc_session_key[0x60]; // reversed order of encrypted session key (AES-128-CBC key is encrypted by RSA)
uint8_t comp_data_hash[0x14]; // SHA-1 hash of Command Packet (before encryption)
uint8_t encrypted_data; // AES-128-CBC encrypted data of Command Packet
At this point, Emotet has gotten the data to post it to the C&C server. Next, it sets up fields in the POST header.
Generate URI Path
This URI path is composed of 1–6 substrings. Each substring is 4–19 bytes and is randomly generated by selecting from [A-Za-z1–9]. The referrer path is the same as the URI path.
The referrer path is set by the IP address of the C&C server and generated URI path above.
Currently, instead of sending the final packet as a simple POST body, Emotet submits that data by encoding it in multipart/form-data. The multipart/form-data has the following format:
Content-Type: multipart/form-data; boundary=%s\r\n\r\n--%S\r\nContent-Disposition: form-data; name="%s"; filename="%s"\r\nContent-Type: application/octet-stream\r\n\r\n\r\n--%S--
Generated by random numbers and written in the following format:
Form Name and Attachment Filename
This form name is generated randomly. It is composed of 4–19 bytes and is randomly generated by selecting from [A-Za-z]. Similarly, the attachment filename is generated with the same algorithm.
Now, Emotet is ready to POST data to a C&C server. If the C&C server address is live and replies to the message, Emotet will parse and decrypt the message by AES-128-CBC then verify the SHA-1 hash of the decrypted data using the RSA public key.
The respond packet is formatted as follows:
uint8_t signature[0x60]; // signature data to be verified
uint8_t decrypted_data_hash[0x14]; // SHA-1 hash of encrypted data after decryption
uint8_t encrypted_data; // AES-128-CBC encrypted data of the C&C’s responded message
After receiving valid decrypted data, Emotet decompresses it and then parses commands and data from the decompressed packet. Data from the decompressed packet is a chain of blocks of the C&C server’s command packets:
uint32_t cmd_packet1_size; // size of cmd_packet1 data
Each block of the C&C server’s command packets is formatted as follows:
uint32_t id; // plugin id
uint32_t command; // command id
uint32_t payload_size; // payload size
uint8_t payload[payload_size]; // payload data
Currently, this Emotet sample receives three commands from the C&C server:
- Command ID 01: Downloads an executable file and executes it
- Command ID 02: Downloads an executable file, checks if there is another active Terminal Service session other than the current session identifier of the running Emotet sample, then launches downloaded executable in that active Terminal Service session:
- Calls API WTSQueryUserToken to obtain the Primary User token of the requested Terminal Service session
- Calls API CreateProcessAsUser to launch process
- Command ID 03: Downloads a module/plugin, loads it and calls to its main function
As we already mentioned, this Emotet sample is using an unknown compression library to compress packets to send to the C&C server and to decompress packets received from the C&C server. This is one of the recent changes seen in the current Emotet sample as, in previous versions, Emotet was using the zlib library. Although we can easily reverse engineer and re-implement the compression and decompression routines from the current Emotet binary, uncovering exactly which compression library it is using helps us understand more about the sample and implement compression and decompression exactly as the sample does.
The compression library is actually a well-known library named LibLZF. But the people behind Emotet made some modifications to that library when they integrated it into the current code base of this bot. We will highlight the modifications implemented below.
Browsing to the source code of LibLZF, we noticed this comment in file lzf_c.c:
Interestingly, this library’s decompression routine does not depend on the hash function used in the compression routine. The hash function is calculated in lzf_c.c (source code of compression routine):
And in the current Emotet binary, LibLZF is compiled with these modifications:
In file lzfP.h:
- Set ULTRA_FAST is 0
- Set VERY_FAST is 0
In file lzf_c.c:
- Comment out to block code as in Figure 21
From this finding, we grabbed a project that is binding LibLZF to Python, changed the code as described above, and compiled it. Figure 22 shows the result of our testing with that Python compiled module on data we dumped from the Emotet sample’s memory:
Over time, the Emotet botnet has evolved and will no doubt continue to evolve in the future. It has proven itself to be an extremely effective weapon for cyber criminals, and is one of the most dangerous botnets active today.
Control flow flattening: https://github.com/obfuscator-llvm/obfuscator/wiki
Control flow flattening write-up by Rolf Rolles: http://www.hexblog.com/?p=1248
LibLZF library (by Marc Lehmann): http://oldhome.schmorp.de/marc/liblzf.html
Like this story? Recommend it by hitting the heart button so others on Medium see it and follow Threat Intel on Medium for more great content.