Emotet: Dangerous Malware Keeps on Evolving

Research into the latest Emotet variant by Symantec’s Threat Engineering Team has revealed details about which compression algorithm the gang behind Emotet has customized and is using in its code.

Threat Intel
Mar 30, 2020 · 12 min read

Authors: Nguyen Hoang Giang, Mingwei Zhang

Emotet is one of the most dangerous malware threats active today. Emotet (Trojan.Emotet) began life as a banking Trojan but evolved several years ago to act as a malware loader for other threats — Emotet infects a machine and then downloads another threat e.g. the TrickBot information stealer, onto the infected system. Emotet is now one of the biggest threat distributors out there, renting its infrastructure out to all sorts of other threats, including ransomware, information stealers, and cryptocurrency miners.

We wrote a detailed blog about Emotet’s evolution in 2018, which gives you more background on the threat and its development.

Recently our threat engineering team noticed some updates to Emotet (Version 5, 20200201), and performed some analysis on the malware sample to see exactly what was going on.

What’s New?

The first thing we noticed is that Emotet has updated its techniques for obfuscating its flow of code. This anti-analysis technique makes it more difficult to analyze and track modifications between variant binaries. The second thing is a change in the communication protocol between the botnet and its command and control (C&C) servers. A detailed technical analysis of these changes follows.

Anti-Analysis

This Emotet binary (unpacked) is using an obfuscation technique called Control Flow Flattening, which works as follows:

  • Each basic block is assigned a number.
  • The obfuscator introduces a block number variable, indicating which block should execute.
  • Each block, instead of transferring control to a successor with a branch instruction, as usual, updates the block number variable to its chosen successor.
  • The ordinary control flow is replaced with a switch statement over the block number variable, wrapped inside of a loop.
Figure 1. Control flow graph of one of the functions is obfuscated by Control Flow Flattening technique

All strings and other resources (such as RSA public key) are encrypted and only decrypt at runtime. The decryption algorithm is described in Figure 2.

Figure 2. How Emotet decrypts strings and resources

We can use IDAPython to create a script to decrypt the data:

In this version, Emotet resolves API(s) by looking up the hashes of the API name and DLL name once it needs-to-use instead of loading them all at one time as previous versions. We observed that this behavior is similar to the code of Dridex or the Bitpaymer/ DopplePaymer ransomware:

Figure 3. How Emotet loads and calls API ExitProcess by looking up hash values of kernel32 and ExitProcess

The customized hash function is calculated as follows:

Main Work

Upon first infection, the Emotet sample runs through two stages. During the first stage, the sample runs as a first instance, does some setup and checks the victim system, it then executes the second instance. The second instance will run in the second stage, where it communicates with embedded C&C server addresses in its binary.

First Stage — Dropper Instance

When it first runs, Emotet tries to decrypt information from additional DLLs that it requires to load. Among the DLLs it uses are:

  • urlmon.dll
  • userenv.dll
  • wininet.dll
  • shell32.dll
  • crypt32.dll
  • advapi32.dll
  • wtsapi32.dll

Then, Emotet gets the volume serial number of the Windows partition. This volume serial number is used to create a series of mutex and event handles, with object names as follows (%X is the format of the volume serial number):

  • Global\\I%X — MutexI
  • Global\\M%X — MutexM
  • Global\\E%X — EventE

A pair, EventE and MutexM, is created for synchronization between the first instance and second instance (by using API SignalObjectAndWait), to ensure that the second instance is only able to connect to the C&C servers once the first instance is exited.

Emotet checks its running privilege by calling API OpenSCManagerW with parameter SC_MANAGER_ALL_ACCESS, if this API call is successful, then the sample is considered to be running with high privilege.

Then the sample decrypts a list of words as below and uses the volume serial number to calculate and select two words from that list. It then combines them to get the filenames of old Emotet binaries that were dropped by Emotet’s previous version. Depending on whether it is running with high privilege or not, the old binary with that filename will be deleted from CISIDL_SYSTEMX86 or CSIDL_LOCAL_APPDATA.

If the sample is running with high privilege, it checks if its running filename contains a number, if so, it will try to delete any other files with the same filename. It looks like this latest sample of Emotet is trying to remove old traits of the previous version(s).

For its next step, Emotet checks the registry value name (is volume serial number) in:

  • HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Explorer — If Emotet is running with high privilege
  • HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer — If Emotet is running with low privilege

The data of this registry value name is the encrypted filename of the Emotet binary that will be dropped by the current version. It is XOR-encrypted and the key is the volume serial number. If this registry value is not available, it means that the Emotet sample is running in its first stage and it will set this value for use in the second instance.

Figure 4. Data is saved in the registry as the encrypted filename of dropped file

To drop the Emotet binary for the second stage, Emotet needs to get a filename. This sample will scan in CSIDL_SYSTEM, only filenames with the extensions .dll and .exe are selected. This is similar to the Bitpaymer ransomware, which also clones an executable file in the system directory to hide its binary in the ADS stream of the cloned file. This new filename is saved to the registry as previously described.

Figure 5. Emotet scans for filenames of .exe and .dll files in %SYSTEM% to generate filename of dropped file

Emotet then makes a path to drop itself to. If it is running with high privilege, it drops the binary to CSIDL_SYSTEMX86, otherwise, it drops it to CSIDL_LOCAL_APPDATA (using API SHFileOperation with the parameter FO_MOVE).

Figure 6. Emotet calls API SHFileOperation to clone itself to destination path

The path of the dropped file is:

  • “(CSIDL_SYSTEMX86|CSIDL_LOCAL_APPDATA)\\%s\\%s.exe” % (new_filename, new_filename)

When it is running with high privilege, Emotet gains persistence by creating a service for the dropped file:

  • “CSIDL_SYSTEMX86\\%s\\%s.exe” % (new_filename, new_filename)

Otherwise, Emotet gains persistence in the registry by setting a registry value (only after it has its first connected to its C&C servers):

  • HKEY_CURRENT_USER\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run

Finally, Emotet launches the second instance by calling API CreateProcessW to the dropped file.

Second Stage — Bot Instance

The second instance goes through the same steps as the first instance until it is able to get the saved data of the filename from the registry. Then it performs some checks before communicating with the C&C servers:

  • Checks current filename is the same as the filename saved in the registry. If not, it runs similar to the first stage to launch another instance.
  • Check if its parent process is services.exe, meaning it is running as a service. If it is, it runs similar to the first stage to launch another instance.

The communication protocol is changed in this version. We will describe this in detail below.

In this version, the IP addresses and ports of the C&C servers continue storing in binary as 8-byte blocks.

Figure 7. IP(s) and port(s) of C&C servers are embedded in binary

Emotet retrieves the RSA public key from encrypted data. RSA public key is in PEM format:

Then the sample will generate an AES-128-CBC session key handle and an SHA-1 hash key handle. The RSA public key, AES-128-CBC Key, and SHA-1 hash are combined to secure the connection between Emotet samples and the C&C servers.

Figure 8. Emotet is retrieving IP/Port list and generating crypto key handles to secure communication

Plaintext Packet

Next, the Emotet sample starts building a plaintext packet for the request. This is basic information to generate packets after that. The plaintext packet has the following format:

Figure 9. Emotet is generating VICTIM_ID based on computer name and volume serial number
Figure 10. Emotet is calculating system info value based on OS version, product type, and processor architecture
Figure 11. Emotet gets session ID from its process ID
Figure 12. Emotet enumerates running process names (no duplicated process names, no process names where parent process ID is 0, and not including Emotet’s process name) puts them in buffer, separated by comma, and in ASCII
Figure 13. Emotet enumerates ID of plugins, which it downloads from C&C and executes. Plugin ID(s) are saved to buffer as DWORD(s).
Figure 14. Full buffer of plaintext packet, which Emotet created

Emotet then compresses this plaintext packet by using an unknown compression algorithm. After digging deeply into this compression algorithm, we can point out which compression library the actors behind Emotet are using and what they customized in that library. Let’s keep going and look at how the compressed data is repacked to a command packet.

Client Command Packet

The command package is composed of the compressed data of the plaintext packet:

Finally, Emotet performs encryption on the command packet (by AES-128-CBC) to generate a final packet, which is posted to the C&C servers through HTTP POST. The format of the final packet is as follows:

Figure 15. Emotet repackages the final packet before embedding it to a POST header request

At this point, Emotet has gotten the data to post it to the C&C server. Next, it sets up fields in the POST header.

Generate URI Path

This URI path is composed of 1–6 substrings. Each substring is 4–19 bytes and is randomly generated by selecting from [A-Za-z1–9]. The referrer path is the same as the URI path.

Figure 16. Emotet generates subpaths for the POST request

Referrer Path

The referrer path is set by the IP address of the C&C server and generated URI path above.

Multipart/Form-Data

Currently, instead of sending the final packet as a simple POST body, Emotet submits that data by encoding it in multipart/form-data. The multipart/form-data has the following format:

Boundary

Generated by random numbers and written in the following format:

Figure 17. Emotet generates boundary string

Form Name and Attachment Filename

This form name is generated randomly. It is composed of 4–19 bytes and is randomly generated by selecting from [A-Za-z]. Similarly, the attachment filename is generated with the same algorithm.

Figure 18. Emotet generates form name and filename of multipart/form-data

Now, Emotet is ready to POST data to a C&C server. If the C&C server address is live and replies to the message, Emotet will parse and decrypt the message by AES-128-CBC then verify the SHA-1 hash of the decrypted data using the RSA public key.

The respond packet is formatted as follows:

Figure 19. Emotet receives a packet from the C&C, decrypts and verifies it

After receiving valid decrypted data, Emotet decompresses it and then parses commands and data from the decompressed packet. Data from the decompressed packet is a chain of blocks of the C&C server’s command packets:

Each block of the C&C server’s command packets is formatted as follows:

Currently, this Emotet sample receives three commands from the C&C server:

  • Command ID 01: Downloads an executable file and executes it
  • Command ID 02: Downloads an executable file, checks if there is another active Terminal Service session other than the current session identifier of the running Emotet sample, then launches downloaded executable in that active Terminal Service session:
  • Command ID 03: Downloads a module/plugin, loads it and calls to its main function

Compression/Decompression Library

As we already mentioned, this Emotet sample is using an unknown compression library to compress packets to send to the C&C server and to decompress packets received from the C&C server. This is one of the recent changes seen in the current Emotet sample as, in previous versions, Emotet was using the zlib library. Although we can easily reverse engineer and re-implement the compression and decompression routines from the current Emotet binary, uncovering exactly which compression library it is using helps us understand more about the sample and implement compression and decompression exactly as the sample does.

The compression library is actually a well-known library named LibLZF. But the people behind Emotet made some modifications to that library when they integrated it into the current code base of this bot. We will highlight the modifications implemented below.

Browsing to the source code of LibLZF, we noticed this comment in file lzf_c.c:

Figure 20. LibLZF defines how to calculate values

Interestingly, this library’s decompression routine does not depend on the hash function used in the compression routine. The hash function is calculated in lzf_c.c (source code of compression routine):

Figure 21. How the hash table is calculated in the compression routine of LibLZF

And in the current Emotet binary, LibLZF is compiled with these modifications:

In file lzfP.h:

  • Set ULTRA_FAST is 0
  • Set VERY_FAST is 0

In file lzf_c.c:

  • Comment out to block code as in Figure 21

From this finding, we grabbed a project that is binding LibLZF to Python, changed the code as described above, and compiled it. Figure 22 shows the result of our testing with that Python compiled module on data we dumped from the Emotet sample’s memory:

Figure 22. Verify compressed/decompressed data with Python compiled module

Conclusion

Over time, the Emotet botnet has evolved and will no doubt continue to evolve in the future. It has proven itself to be an extremely effective weapon for cyber criminals, and is one of the most dangerous botnets active today.

IOC

Analyzed Sample:

aa0cbe599839db940f6cc2f4ca1383dbb9937b8c7dd6460847c983523cd63c39

References/Further Reading

Control flow flattening: https://github.com/obfuscator-llvm/obfuscator/wiki

Control flow flattening write-up by Rolf Rolles: http://www.hexblog.com/?p=1248

LibLZF library (by Marc Lehmann): http://oldhome.schmorp.de/marc/liblzf.html

https://research.checkpoint.com/2018/emotet-tricky-trojan-git-clones/

https://www.cert.pl/en/news/single/whats-up-emotet/

Check out the Security Response blog and follow Threat Intel on Twitter to keep up-to-date with the latest happenings in the world of threat intelligence and cyber security.

Like this story? Recommend it by hitting the heart button so others on Medium see it and follow Threat Intel on Medium for more great content.

Threat Intel

Insights into the world of threat intelligence, cybercrime…