Introduction to Static Malware Analysis
If you have kept up with the mainstream news feeds or attended any cybersecurity awareness sessions / training provided by your organization, you probably have heard about malware. For those of us that work in the cybersecurity space, our relation with malware goes a bit deeper. We have to learn what the malware intends to do (harmful act) and how it does it (behavior and technique). Answering these two questions helps us defend organizations better.
What is Malware?
In its simplest form, malware is nothing but a program/code that is written to perform a harmful act against a target. In many references it can also be called “malicious payload” or more casually against non-technical people, a virus. People are more used to refer to any malware as a virus due to their exposure to different anti-virus (AV) solutions installed on their work and personal PCs.
What is Malware Analysis?
It is the process by which cybersecurity analysts (whether malware reverse engineers or cybersecurity forensics specialists or any other relevant title) examine in detail the actual malicious code to understand more about it and how to defend against it. Here is a slight elaboration on both objectives:
- Understand more about it — cybersecurity experts want to know where the malware comes from, what is the driver of the attacker (financial gain? operation disruption for political reasons? etc.), how different is it from other malware, how does it compromise the target and how it can spread, etc. To answer all the above, it is necessary that an in-depth analysis of the code is done and documented to extract the relevant information.
- How to defend against it — After finding all the relevant information above, cybersecurity teams are then in a better position to understand the entry points (also referred to as attack surface) that could leveraged/exploited to compromise the target and exfiltrate (extract the required data to a remote location) important data from the hosts. By identifying these, cybersecurity teams can develop signatures, install different security tools to detect and eradicate the malicious code, and reinforce different systems against such attacks.
For clarification, the process of malware analysis is also sometimes referred to as reverse engineering. Personally, I see the difference between the two is that the latter is much more complex and requires a whole different level of expertise to be able to understand machine code and binary. Reverse engineering is mainly focused on seeing how the malware dynamically behave and mapping clearly this behavior based on different instruction sets that can be examined through specialized software (assembly decompilers).
Static Analysis vs. Dynamic Analysis
Before we go on to the actual static analysis in this article, it is useful to explain the difference between the two different analysis approaches:
Static Analysis — This involves examining and analyzing the malware without actually running it (hence the static in the name). The goal here is to learn as much as possible about the malware by reviewing any associated metadata or snippets of the code which usually involves invoking different functions depending on the language and framework used to develop the malware.
Dynamic Analysis — This on the other hand involves actually detonating the malware (a term used to express running the malware in a secure environment and closely monitoring how it interact with the host PC and how it behaves to modify the regular way of work). Usually analysis at this stage requires advance skills in assembly code review (through assemblers and decompilers), and tracking the different instructions, calls and access to file systems and network resources.
It is common practice for cybersecurity experts to perform static analysis prior to proceeding with dynamic analysis. Static analysis can reveal so much information about the malware and can be enough to answer the questions required to detect and eradicate the malware.
Static Analysis Example Walkthrough
To perform static analysis (or dynamic analysis), first thing we must get our hands on a malware. Then we need to have a dedicated, isolated environment where this analysis can be completed. Despite the static analysis not requiring to run the malware (less chances of infecting the host), it is always best practice to perform any analysis on a standalone, isolated environment to protect our data from any what-if scenarios (e.g. launching the malware by mistake!).
I obtained a malware for research purposes from an online repository. You can find many resources that offer a malware sample for experimentation and analysis. First things first, I setup a virtual machine (that isolated environment I spoke about earlier) and moved the malware there. The environment I setup is an Ubuntu-based VM.
You can see here the file titled “malware-sample.exe”. To start things, The best way to start is to get the file hash and start researching that hash online in different websites to see if others encountered this malware and what they found what. This research in real life incident response is priceless, and can save you days and weeks in discovery efforts.
To get the hash, simply launch the Terminal utility and use the built-in Linux utility md5sum:
We’ve successfully retrieved the MD5 hash for this file
and now we take it to one of the best websites for threat intelligence research (VirusTotal) to see if others have seen this specific malware and what they discovered:
Since this is a known malware sample, we see that VirusTotal result for the hash indicate that 57 security vendors recognize it as malicious. We can then click on the different tabs on the website such as Summary, Detection, Details and so on and explore the different details.
In the screenshot above, we selected to review what others found about the behavior of this malware sample, and based on what we see is the malware communicating with different destinations over the internet when launched. It is highly recommended that any person starting their journey in this field be comfortable with most information displayed on these pages because that’s how they can improve their malware analysis skills.
This utility is built in into linux distributions, and is used to return the string characters into files. It primarily focuses on determining the contents of and extracting text from the binary files (non-text file). Here is a quick look at the man page for this utility:
This utility can be useful to determine quickly what functions are being called and in some cases can also reveal very important information such as hardcoded passwods.
Now we can run the malware-sample.exe through the utility by typing:
This will display the following information (few lines only captured for illustration purposes:
A quick review of the lines above can help us understand some of the behavior of the malware, including the FromBase64String and ToBase64String functions that are responsible for decoding and encoding respectively. Encoding and decoding are usually used to hide the original content through converting the parameters to a string of randomly generated characters. Also we can observe the call to a function get_Jpeg which is probably responsible for loading an image from a remote source. With time, cybersecurity experts become highly efficient in understanding what most functions mentioned actually perform and this can speed up the analysis time.
Portable Executer (PE) Header Analysis
Another approach to analyze the files we have at hand is through findings a utility / tool that focuses on PE header analysis. PE is basically any windows executable file (.exe). When developers complete the development of any PE programs, they “pack” the program so that it can be run efficiently in terms of memory and size, and also to make it harder for any reverse engineer to reverse the program’s code (not only useful for malicious payload, but also for companies protecting their proprietary codebase). Each packed executable will contain a header that has a lot of information about the program and these could be useful for cybersecurity experts to understand more about the origin of the malware and have a rough idea of its capabilities.
On Windows, analysts can utilize a tool called PE Studio (link) to load the executable and the program would provide a lot of insight into the executable, including automatically checking against known malware and threat intelligence repos such as VirusTotal for artefacts matching.
The screenshot above of PE Studio reveals a lot of inforamtion that is critical to understanding the malware. If we zoom in to the left size panel (shown below) we can find the different sections classifying different artifacts from the executable. For example: indicators highlights what is known or potentially could be a sign that this executable is bad. We also get a section for VirusTotal as mentioned before, and in addition mapping to MITRE framework (will be discussed on separate article in the future).
At the end of this article, I would like to note that in most scenarios, it is better to utilize services like VirusTotal (website) or Hybrid Analysis (website) to get the information about the malware. These services also offer capabilities for dynamic analysis (sandboxing) that will allow you to upload a file, the file will be analyzed and all information extracted, and displayed for your review. They all have free tier, but the paid tier will provide more information and correlation capabilities.
Malware analysis is not easy, and what was explained here is barely scratching the surface. I will post additional articles in the future looking at more hands-on reverse engineering. Malware reverse engineers are highly regarded and respected in the cybersecurity domain because they have to know exactly what they are doing and make sense of a lot of assembly code and instructions that even expert developers can’t read.
Hope this was a beneficial introduction to malware analysis and the walkthrough will continue in future articles.