MALWARE ANALYSIS (Part-1)

Dark_Emperor
10 min readSep 12, 2022

--

Hey all, todays article is going to be very interesting as we are going to learn how to analyse malware.

So before starting, let’s learn what malware is :-

It is a piece of software used by cybercriminals to launch cyber assault on any network/organisation for their personal benefits. It is used for malicious intent that’s why known as malware.

There are various types of malware like trojan horse, worm, spyware, keylogger, rootkit, adware, ransomware, fileless malware, backdoor, malvertising etc.

Now, let’s see what Malware Analysis is

It is a process of understanding the behaviour or purpose or you can say that intention of a particular file or sample.

Definition looks simple, but trust me it is not simple as it looks like .

So, how many types of analysis are there :-

There are various types of analysis, but mainly there are 3 types of analysis :-

1. STATIC ANALYSIS :-

In this analysis, analyst will not execute the code, instead what they do is that they will take the help of tools and they will dissect the piece of code and then they will extract malicious code from it.

As you can see, analyst didn’t executed the malware/file, that’s why this analysis is known as “CODE ANALYSIS”.

Now, if static analysis is not able to answer our questions, if situation is going out of control, if adversaries are using some advance malwares or you can say that ZERO DAY where no one has any idea, where everyone is completely clueless about type of malware, when there are so many questions, then we need some advance analysis to give answers and that analysis is known as ……

2. DYNAMIC ANALYSIS :-

As name suggest dynamic, here malware analyst is going to execute the malware in “very” isolated environment.

Make sure that in which environment that malware you are going to analyse, that environment should be very isolated, there should be no internet connection and there should be no personal or professional related data in that environment. Because even a single mistake during running a malware without proper precaution may backfire the malware and that malware may jump out of the isolated environment and start spreading over the intranet, and this may lands you in deep trouble.

That’s why this analysis is known as “BEHAVIOURAL ANALYSIS”.

HOW DYNAMIC ANALYSIS IS PERFORMED :-

  1. Make sure you are in isolated environment.
  2. Drop the malware in sandbox.
  3. Execute “Fakenet”. ( Fakenet is a tool which is used to deceive malware in order to thinking that machine is connected with internet. Because dynamic analysis is performed in isolated environment without internet connection and if malware understand that machine is not connected with internet, then malware will not execute and you cannot perform malware analysis properly. )
  4. With the help of screenshot tool (Regshot), or any other tool to take screenshot before executing malware.
  5. Execute the malware.
  6. Again take screenshot after the execution is complete.
  7. Compare the screenshot and look for any deviation or manipulation in screenshots .

NOTE :-

It is not necessary you have to perform dynamic analysis in same way as mentioned above. It depends on your situation.

Now if there is any sort of manipulation/deviation in screenshot like some modification in processes or some good processes get killed or some suspicious processes get added then you can say that file is malicious.

But if there is no manipulation, you cannot directly conclude that file is not malicious. Because now-a-days there are so many zero-day vulnerabilities exists where no one knows because blackhat hacker never reveals there secret weapon. So what happen when you perform dynamic analysis and result shows that file is not-suspicious but it might be possible that day you encounter new technique/vulnerability. So that file should be thoroughly investigated before concluding your results.

This leads to our next analysis known as…

3. HYBRID ANALYSIS :-

Doing both analysis at a same time known as “Hybrid Analysis”.

This analysis gives you more information with high accuracy , as you are doing both analysis at same time(static + dynamic).

So, for today’s article we are going to analyze PDF.

But, WHY PDF ?????

Because pdf is widely used everywhere and there are various APT’s who use PDF as vector of initial access like Cozy Bear, Sharpshooter, Dark Caracal and so on. You can easily check on MITRE ATT&CK FRAMEWORK and see APT’s.

Now before we dive deep to analyse pdf, you should know how pdf (or any filetype) works behind the scenes.

It is very easy to understand, So let’s start,

PDF CONTAINS 4 MAIN PARTS :-

1. HEADER :-

At top of the pdf, you will see pdf version number and header contains this information. Now it is not necessary that header is going to be present on top, but it should be present within the 1024 bytes.

EX :- %PDF-1.1, %PDF-1.3, %PDF-1.5

2. BODY :-

Now whatever we see text, links, graphics , images comes inside this section. It holds these information.

The body contains different objects which reference to each other, these objects have different types like :

  • Names =/name backslash followed by ASCII characters – used to set a unique name.
  • Strings =(text) it contains text that it enclosed in parentheses.
  • Arrays = enclosed with square brackets ([...]) can contain other objects.
  • Dictionaries = table of key and value pairs. The key is a name object and the value can be any other object. Enclosed within double angle brackets (<<...>>)
  • Streams = It contains embedded data structures like images (or code) which can be compressed. Streams represented by a dictionary that set the stream’s length with the key /Length and encoding /Filters.
  • Indirect object = object that has a unique ID, the object starts with the keyboard obj and ends with endobj (similar like brackets in programming languages )other objects can reference the object using its ID. For example a reference to object with ID 3 we would look like this: 3 0 R

3. CROSS-REFERENCE TABLE :-

This is the table which allows the pdf parser to quickly access every object inside the Body, begins with the keyword xref. You can consider this table as excel sheet like whenever human see excel sheet so they can easily locate information whatever they are looking for. So, whenever parser see this table, it can easily see where a particular object is located.

This table is useful while opening big files.

4. TRAILER :-

Trailer contains overall information about the PDF, points to the start of Cross Reference Table(XREF).

See below image for better understanding.

Next we are going to see KEYWORDS, and as a malware analyst, you should know what keywords are because keywords will tell how pdf is going to behave.

PDF Actions :-

  • /OpenAction /AA = the function of this element is to carry out an action for e.g. execute a script
  • /JavaScript /JS = link to the JavaScript that will run when the PDF is opened
  • /Names = names of files that will likely be referred to by the PDF itself
  • /EmbeddedFile = shows the other files embedded within the PDF file itself e.g., scripts
  • /URI /SubmitForm = Links to other URLs on the internet e.g., possible link to a 2nd stage payload/additional tools for malware to run
  • /Launch = Similar to OpenAction, can be used to run embedded scripts within the PDF file itself or run new additional files that have been downloaded by the PDF.

Now, it is not going to happen that adversary is going to embed malicious code in front of you as it is. They want to be stealthy so to remain stealthy they will encrypt or encode the code.

And in PDF this is known as filter.

PDF Strings, Encoding & Decoding :-

PDF can encode strings in multiple ways to obfuscate data, the following example shows the string “Hello World” before and after hex encoding.

In order to make things more complicated, adversary may use multiple layer of encoding and this is known as “STACKED FILTERING” .

If you want to decode this, first of all you have to decode the last filtering and then preceding one. (Means in Reverse order).

There are various encoding techniques like :-

  • /ASCIIHexDecode = hex encoding of characters
  • /LZWDecode = LZW compression algorithm
  • /FlateDecode = Zlib compression
  • /ASCII85Decode = ASCII base-85 representation
  • /Crypt = Various encryption algorithms

Now we are good to go to analyze malicious pdf :-

Before analyzing pdf, make sure you are analyzing in VM or any other isolated environment.

And I am going to use my Remnux linux distro which is especially designed for malware analysis and reverse engineering, but you can use any linux distro. You can download remnux from here .

Now to analyze malware, you need a malware sample and you can download any malware sample from this website . Search pdf and download malware. Be very cautious because they are active malware.

Now we have malware sample and we are good to go.

So for pdf analysis there are various tools like pdfid, pdf-parser, peepdf, pdfextract and so on.

You can seek some more information here .

So with the help of pdfid we can get occurences of every words present inside the pdf as shown in below image.

Now as you can see we have some risky keywords like javascript, acroform and openaction. So let’s investigate further, but disadvantage of this tools is that it is not giving me more information. We need more information for further analysis.

So to solve this we have another tool pdf-parser.

You can see it’s documentation also. So with the help of risky keywords and this tool, let’s extract information as shown in below image.

Let’s search for javascript also as shown in below images.

We see that object number 10 and 13 contains javascript. Now having javascript inside a pdf doesn’t means that pdf is always going to be malicious because there are some use cases of javascript inside pdf like to restrict some user actions or to pre fill some form fields and etc. But the thing which raises a red flag is that JS is not alone, we have openaction also and what openaction do, it automatically executes the code present inside the pdf . And in this pdf we have JS, so it might be possible that openaction is executing this JavaScript code. So this tells us that we have to deeply investigate this pdf.

Let’s extract information from particular object with option (-o) as shown in below image.

This object is not giving me much information. Let’s search for another object as shown in below image.

As we see, object 13 contains something of length 1183 and it has some streams and that stream is encoded.

Let’s extract information from this particular object 13, by giving some option as shown in below image. (You can see help menu of pdf-parser by supplying --help option).

Now above commands are saying that, I want to extract raw data(-w) from object number 13(-o 13) and wants to decode also(-f).

Let’s see what we get.

We get something but not in readable format. Let’s dump this to a file.

And let’s see how code looks like.

Now whenever you see such type of code, you will see 4 things will going over there.

  1. OBFUSCATION.
  2. ONELINERS.
  3. CODE WILL BE POORLY DRESSED.
  4. CODE WILL NOT LOOK SO CONVINCING.

You will see some various types of unnecessary encodings, unexpected oneliners and especially some random variable names , function names, functional arguments and all of these are going over there.

Now to decode codes, either you can do this manually which is great skill known as “Reverse Engineer” or can also take help of tool.

And let’s take help of another tool known as peepdf which is a very greeeeeeeeeeat tool and it should be in your arsenal whenever you are going to analyze pdf.

Let’s see….

Good thing about this tool is that it gives all information in just one shot as shown in below images.

But we want to analyze javascript that we extracted, so to do this open this tool in interactive mode by giving option(-i) as shown in below image.

By giving help, we see how many options are there .

Now we want to analyze extracted javascript. So to do this, just see below image.

And after hitting enter, we see some suspicious URL.

Looks like this URL is downloading some shellcode from this IP address.

To confirm this, you can run this code in isolated environment. But I don’t have any isolated environment.

So we have successfully analyzed malicious pdf and indeed it was malicious.

You can practice and this field needs consistency, patience and practice. So all the best and hope you like this article and learned from it.

I will back soon with next awesome article. Till then good luck.

--

--

Dark_Emperor

Bug Bounty Hunter /|\ Cybersecurity Associate /|\ Mad about Cybersecurity.