Tracking Advanced Persistent Threats (APTs) via Shared Code

You don’t need to worry about tracking malware variants if you have full faith in your anti-virus and other defenses. However, if you think there is a possibility that these defenses may not stand up to a concerted attack by an advanced adversary, wouldn’t you want to know who is attacking you and why?

You can use a malware to track information about the adversary. How? By tracing the malware code to other previous attacks by the same threat actor.

Using its ability to trace shared malware code across very large repositories, Cythereal MAGIC connected malware samples shared by US CYBERCOM to attacks on the Italian Military in 2018 and on TV5 Monde in 2017, both attributed to APT28 (see below). All done automatically, without any reverse engineering or deep malware analysis.

Code Reuse in Malware

Security companies report hundreds of thousands of new malware a day. Think about it. How is it realistically possible to write that many new malware a day?

Well, most of the malware found every day is not new. A (so-to-say) new malware, most often, is a new version of an old malware, much like Microsoft Office 2019 is new version of Microsoft Office 2016. Alternatively, it may also be an obfuscated variant created by shuffling the program code (loosely speaking) such that the malware does the same thing but its bytes look different.

Threat researchers routinely take advantage of the reuse of malware code by threat actors. They use an evidence of shared code between malware used in two attacks as an evidence of connection between the threat groups behind the attacks.

Here are some high profile stories:

Because the use and reuse of code in malware is dictated by engineering and economic constraints, it follows that malware code can serve as a more stable indicator for connecting and tracing attacks over a longer period of time than IoCs like IP addresses, domain names, and malware hashes.

Finding Malware Variants via Code Reuse

Further reading: Weaponizing Malware Code Sharing with Cythereal MAGIC.

Cythereal MAGIC (Malware Genomic Correlation System) is a “content-based malware retrieval system,” analogous to “content based image retrieval systems,” such as Google Images. It uses an entire malware as a search query to find other similar malware.

Figure 1. Using MAGIC to trace malware connected via shared code

Figure 1 shows the process of using a malware as a query to search for other malware. There are two work flows, supporting two use cases.

  • MAGIC Search: Find similar malware in the MAGIC database.
  • YARA Search: Use MAGIC generated YARA rules to find variants in other threat repositories.

For the first, it is just a matter of uploading a malware onto MAGIC, and letting it do the rest. For the second, MAGIC generates YARA rules which are used to search other repositories, such as Virustotal, Hybrid Analysis, and Alien Vault OTX, to find other related samples.

Further reading: YARA: The pattern matching swiss knife for malware researchers (and everyone else)

The current state-of-the-practice to find malware variants is to use a combination of Bindiff, SSDEEP, and Imphash (other than using YARA). These tools differ in their accuracy and scalability. Bindiff is highly accurate, but not scalable, since it performs pairwise comparison. SSDEEP and Imphash are scalable, but have poor accuracy when it comes to dealing with obfuscated malware. A typical workflow for finding variants of a malware in a repository is to shortlist candidates using SSDEEP and Imphash, and then filter the list with Bindiff.

MAGIC improves upon the current state-of-the-practice. It has the accuracy of Bindiff and the scalability better than SSDEEP.

APT Malware from CYBERCOM

Cyber National Mission Force (#CNMF), a unit of US Cyber Command, shared a collection of eight malware samples on Virustotal between November 2018 to June 2019. Though #CNMF shared the malware without any attribution, threat researchers have attributed this malware collection to the Advanced Persistent Threat group APT28, also known as Fancy Bear, Sofacy, Sednit, and Pawn Storm. (see Kaspersky and Zone Alarm).

The eight #CNMF malware contain variants for each of the three tools known to be used by APT28— Lojack, X-Agent, and X-Tunnel:

Further reading: APT28

Case Study — Tracking variants of #CNMF malware

We used MAGIC to trace variants of malware in the #CNMF collection. Our search led to malware used in several attacks attributed to APT28. We invite the readers interested in the complete report to contact us (see instructions below).

Here are highlights of our findings:

  • A search of the MAGIC database using #CNMF malware retrieved 16 variants of Lojack, of which three samples were first seen over 10 years ago and one about 12 years ago.
  • A YARA search using the rule generated by MAGIC from a Lojack variant in #CNMF collection produced over 20 Lojack variants in the community database of Hybrid Analysis.
  • MAGIC traced code connections from X-Tunnel samples in #CNMF collection to two variants of X-Tunnel on Hybrid Analysis, of which one was reportedly used in the attack on TV5 Monde in 2017.
  • A MAGIC search found five variants of X-Tunnel EXE from #CNMF collection in the MAGIC database and found two variants on Hybrid Analysis by searching with MAGIC a generated YARA rule.
  • There were no X-Tunnel DLL variants found in the MAGIC database, but search on Hybrid Analysis using a MAGIC generated YARA rule produced two variants of X-Tunnel, of which one was reportedly used in the attack on the Italian Military in 2018.

Why care for malware variants?

The big question is, what is the practical value of finding variants? What new information do you gain from variants that you cannot get from a single sample?

You don’t need to worry about malware variants if you have full faith in your anti-virus and other defenses. However, if you think there is a possibility that your defenses may not stand up to a concerted attack by an advanced adversary, wouldn’t you want to know who and why?

Further reading: Decaying Indicators of Compromise

And why would you want to know who is attacking you? So that you can learn about their TTP (Tactics, Techniques, and Procedures), and harden your defenses to defeat or mitigate the attack. For instance, if you determine that you are being attacked by APT28, you’d get all the IoCs associated to it and populate them in your breach detection devices, such as Carbon Black.

A single malware sample doesn’t tell you whether there is a concerted attempt to breach your organization by an advanced adversary. It is worse for a zero-day malware — that is, a malware that is not yet declared as malicious by any anti-virus. In the absence of any prior history, the difficult work of determining whether a suspect file is malicious, and if so what that malware does is left to you.

Our case study serves as a good demonstration of the use case. As mentioned earlier, US CYBERCOM shared the malware without any attribution. The samples were also completely new to the community, with the “firstseen” on Virustotal to be the date when CYBERCOM uploaded them. There was no prior history associated to those samples. There was nothing known about them, other than that they were uploaded by CYBERCOM. There was nothing that said the malware in the collection were tools used by APT28.

Security analysts often face a similar scenario during threat hunting and incident response. They find a suspicious sample, and they need to rapidly determine if it belongs to an APT.

Our case study demonstrates the power of tracking malware variants through shared code. By tracing the code of the #CNMF samples, we found variants that led us to reports on attacks where some of the variants were used. From these reports we learned that the malware belonged to the APT28 arsenal, which further led us to a host of information on APT28, including a repository of IoCs maintained by ESET.

And that we could trace the malware code through MAGIC’s repository and that of Hybrid Analysis, without any reverse engineering or malware analysis, shows the power of MAGIC.

In closing…

By automatically tracing the history of every malware that is quarantined, security operations can track progression of attacks as they evolve, learn about previous attacks by the same threat actor, and harden the defenses to defend against the specific threat.

Indicators such as IP addresses, domain names, and malware hashes — currently used as indicators of attack — have a very short useful life since these indicators are transient. However, because the use and reuse of code in malware is dictated by engineering and economic constraints, it follows that malware code can serve as a more stable indicator for connecting and tracing attacks over a longer period of time. Indeed, this is how threat researchers track APTs and identify perpetrators of high profile attacks, though they currently do it manually.

Our study shows that through its ability to automatically trace code connections quite accurately in very large malware repositories, MAGIC makes it feasible to routinely analyze each and every piece of malware, whether the malware successfully breached the defenses or not. By automatically tracing the history of every malware that is quarantined, security operations can track progression of attacks as they evolve, learn about previous attacks by the same threat actor, and harden the defenses to defend against the specific threat.

CONTACT: To receive the complete report on tracking APT28 variants, please contact info@cythereal.com.

Founder/CEO, Cythereal, Inc.; Director, Center for Critical Infrastructure Cybersecurity; Professor, Computer Science at University of Louisiana at Lafayette