Weaponizing Malware Code Sharing with Cythereal MAGIC

Arun Lakhotia
5 min readJul 17, 2018

--

Weaponizing shared code means you use code from one malware family to catch not-yet-seen variants of not just the same family, but also of other malware families.

“Gimme a break,” you say, “how can this be possible?” Because malware authors reuse and share code. More importantly, writing malware, just like any other software, is expensive. So reuse and sharing of code is not much of a choice, but a necessity driven by software economics. Most often new malware families are spawned from existing code of other families, often made public intentionally or accidentally.

> Read: WannaCry shares code with Lazaruz APT samples
> Read: A Look into 30 Years of Malware Development

Cythereal, a new cyber security startup, is leading the way in weaponizing malware code sharing with the launching of MAGIC (Malware Genomic Correlation), a web service that analyzes malware streams, clusters malware with “semantically similar” code, and transforms code shared between malware into Yara rules for detecting future variants of same and other malware families.

Introducing Cythereal MAGIC

Aimed at improving malware defense and attribution, MAGIC is the outcome of a series of research efforts sponsored by the US Department of Defense and US Department of Homeland Security.

If you are “reverser,” you may think of MAGIC as “Bindiff on Steroids.” While Bindiff tells you what code is shared between two given binaries, MAGIC searches through millions of malware to find those that share code.

Alternatively, you may view it as “Google Images for Malware.” You upload a malware binary to MAGIC and it finds a collection of malware originating from the same code base.

Cythereal MAGIC turns code shared between malware into threat intelligence and Yara rules

MAGIC is however significantly more than a search engine for malware. It doesn’t just stop at finding similar malware. It uses the shared code it finds to create Yara rules that can detect other variants of that malware, along with any other malware with which it shares code.

And yes, MAGIC is able to find shared code in malware variants produced by packing, polymorphism, different compiler optimizations, and plain old code modifications.

How does MAGIC work?

MAGIC takes in a stream of malware, such as, from global malware threat feeds or those detected by anti-virus systems guarding an organization’s end-points, proxies, or gateways. It sifts through the stream to find active campaigns, that is, a burst of malware originating from the same code base. It measures a campaign’s evasiveness — a measure of its ability to evade antivirus systems — and identifies the campaigns that are likely targeted attacks. MAGIC also generates Yara rules that may be used to provide a second layer of defense against highly evasive zero-day variants and event not-yet-created malware families.

Malware campaigns found in a quarantine of 2,600+ malware

The power of MAGIC comes from its use of malware genome, a normalized representation of the denotational semantics of malware code. Malware genome, developed at the University of Louisiana at Lafayette as part of research done in the DARPA Cyber Genome project, makes MAGIC resilient to polymorphism and compiler optimization variations.

MAGIC also has a versatile unpacker that overcomes a variety of anti-VM and anti-debugging tricks to get to the malware payload.

MAGIC extracts genome from, both, the original malware binary and the unpacked binary and feeds it to a data mining system that clusters malware based on shared genome with similar semantics.

The following table provides an example. It tabulates the procedures shared between 24 binaries of a campaign. Each row of the table represents a shared procedure. The value “100% coverage” in the first column indicates that the procedure exists in each of the 24 binaries. The annotation “variant-instr”, “variant-bytes”, and “variant-blocks” indicates that the procedures in that group are variants, that is they have different code but perform similar function.

Procedures shared in a campaign and their attributes

Finally, having detected a cluster of malware with shared code, MAGIC computes an “abstract symbolic automata” for a group of shared procedures. Each automata captures the common computation of all procedures in the group. MAGIC then translates each automata into a bytecode regular expression, and constructs Yara rules using a set of these expressions, such as that shown in the figure below.

Yara rule with bytecode regular expression generated by MAGIC

MAGIC generated Yara rules are of a much higher quality than those generated using shared strings, API calls, or such information. Using byte codes extracted from actual executable code, rather than the PE header or data area, reduces the possibility of random matches with non-program files. MAGIC leverages its large corpus of malware and benign code to select procedures that are more prevalent in malware, and rarely, if ever, occur in benign code. Creating Yara rules using byte codes of such procedures makes for the casting of a wider net, thereby catching malware across multiple families, without increasing False Positives.

> Want more details: Checkout these research papers.

Performance

VirusBattle, the precursor to MAGIC, was developed and perfected under the aegis of several US DoD and DHS sponsored projects, including the DARPA Cyber Genome project. In these projects VirusBattle was put through very rigorous testing and use by experienced malware researchers. The tremendous success of VirusBattle in federal government project was the driver for bringing it to the commercial world.

MAGIC, the highly scalable incarnation of VirusBattle, operates on the Google Cloud Platform. In its current configuration it processes around 30,000 malware per day, with 75% of malware processed in under 4 minutes. MAGIC analyzes binaries for most of the common architectures, file formats, and operating systems, including Windows, Linux, and Android.

Case Study

MAGIC’s power is on display in our recent analysis of the VPN Filter malware. Yara rules generated by MAGIC caught all of VPN Filter samples, and also caught samples of PNScan, Mirai (Gafgyt), Filecoder, and Tsunami. Of particular significance is the observation that though these Linux malware families have been known for a while, a code level connection between them has not been reported before.

> For more details, read: MAGIC generated Yara rules for VPNFilter also catch other botnet families

Accessing MAGIC

To access MAGIC or to get more information, please send a note to arun@cythereal.com.

--

--

Arun Lakhotia

Founder/CEO, Cythereal, Inc.; Director, Center for Critical Infrastructure Cybersecurity; Professor, Computer Science at University of Louisiana at Lafayette