I meant to write this post a long time ago, but life got crazy and my job got far less technical. I thought it would be a good idea to come back around now that I am transitioning back into a technical role. Also a great opportunity to work with the updated version of PE Studio as there have been some amazing changes to the program in the past 12 months. This tool is a must have in every Incident Reponder’s toolbox.
Windows binaries are packed in a format called Portable Executable, or PE. I wrote a short blog post about PE files in my first blog post last year . Sometime in the future I will go through how I perform some basic surface and runtime analysis of PE files. Incident responders are often brought in to situations where some sort of portable executable was downloaded onto a host and executed.
Some of the questions asked by the incident responders are “What does it do?”, “Is it malicious?”, and lastly “Did it execute?” . There are two basic types of analysis an incident responder can do quickly to answer those questions. We can do some basic surface analysis of the file, it can also be called property analysis. This is where we statically identify the properties of a files, such as the hash value, strings, and library imports, to draw some conclusions based on our past experience and research. If a file contains an IP address “18.104.22.168” and contains imports for ws2_32.dll, then we can probably make an assumption that the sample is going to create a Socket to the aforementioned ip address . No reverse engineering necessary to identify this. We can also do searches fo the hash on popular malware sites, such as virustotal.com, to see if the sample had been analyzed. Or we can use the library imports to help identify some problems the PE file may give us during runtime analysis. If it imports IsDebuggerPressent(), then we can easily assume that the file is aware if it being debugged. There are a thousand examples of this type of logic which just takes hands-on work and liberal use of msdn.microsoft.com to learn.
PeStudio is an amazing all-in-one surface analysis tool for Portable Executables . PeStudio is a Windows application, I have never tried to run it in Wine. I typically use it in a Windows Virtual Machine dedicated to malware analysis. But the application does not require installation. It can run on its own from a thumbdrive without impacting the integrity of the host being investigated. After downloading the PK-ZIP from the wintor downloads page, unpack the file into its own directory . Six files and a sub-directory are unpacked into the folder as seen in Figure 1. The “pestudio.exe” is the application.
Figure 1. PeStudio Unpacked.
Inside the “xml” directory are eleven XML files. These are configuration files which provide input for blacklisted items and to actually configure the application. One useful feature of PeStudio is its ability to lookup a file by hash in VirusTotal. It comes packed with the indicators listed in XML files. You can edit these files to add additional indicators or signatures if you need to. In older versions you could change the VirusTotal API key or disable VirusTotal all together. As of writing this I could not find the option in the configuration files of the newer version. I will keep looking and update this post when I find it.
Figure 2. XML files.
When you first open PeStudio version 8.54, you are presented with a simple screen asking you to drag and drop a file for analysis. To analyze a file we can simply drag a file to the window or we can go to File > Open File or click on the Open File icon on the task bar. All real simple.
Figure 3. PeStudio 8.54 Initial Window.
We ask PeStudio to analyze a file then wait a few seconds while it fetches VirusTotal lookup data and parses the strings and imports. If there is no Internet access, the VirusTotal lookup will obviously not be successful. After analysis is complete we are presented with a window that has some basic descriptive attributes of the file. It gives us the MD5 and SHA1 hashes of the file. It also provides a hash of the imports, called imphash. This is interesting because similar pieces of malware will have the same imports, but may have different attributes which cause the MD5 and SHA hashes to change. Since the imports are the same, the imphash will be same even if the full file hashes are different.
The CPU width is important if we are going to perform runtime analysis of the file. If we see a file that is a 64-bit application, then we will need a 64-bit version of Windows in our sandbox. File size can also be an indicator of similarity between two files. Again the file hashes may change, but if the size of two files is close, it could be used as a piece of evidence that two files are the same. The Date is simply the date that PeStudio analyzed the file. Later we will talk about compile dates which is a far more useful piece of information. Signature is the educated guess by PeStudio to identify the compiler/linker or packer. When you get into reversing an application, knowing the compiler/linker can help to separate user code from compiler code. If the file is packed, knowing the packer can help identify how to unpack the file. Packing is simply a method to compress a file, but in the process it also obfuscates the file.
Figure 4. Analyzed file.
Next down on the menu we have the ‘indicators’ window. I mentioned earlier that PeStudio has a list of indicators it uses to identify whether a file is worthy of suspicion beyond simply doing a VirusTotal lookup. This ‘indicators’ page summarizes the indicators found further down in the menu tree. For this example, we can see that the file makes modifications to the registry, has references to the File Transfer Protocol, and has a VirusTotal score of 49/55. There are two items which stand out to me as a malware analyst.
This file ignores Address Space Layout Randomization (ASLR). ASLR is a feature which simply loads an application into memory at a somewhat randomized preventing the ability to successfully perform a buffer overflow attack. It also ignores Data Execution Prevention (DEP) which would allow for code execution from the Data Section in memory.
Figure 5. Indicators.
The ‘virustotal’ window is simply a summary of the VirusTotal lookup. It lists all of the anti-virus products the sample was tested against. If they were flagged by a product, the signature name will be displayed in the ‘positiv’ column. This is the same information you would get if you did a search of a hash on virustotal.com.
Figure 6. VirusTotal window.
The ‘dos-stub’ is next. This window displays information about the DOS application header which comes before the PE header information. It is very rare that an application has much in the dos-stub. PeStudio will display the MD5 hash of the dos-stub, the size, and entropy.
‘file-header’ is interesting if simply because it contains some useful information to accurately describe a sample. This window provides information that would be in the PE header if you were analyzing this in another application. In fact, the ‘signature’ field should say 0x00004550, which when you convert that to ASCII it states ‘EP’ or flipped the other way for the sake of endianness (I will explain that another time) it states ‘PE’  . The ‘stamp’ field contains the timestamp of when the application was compiled. Assuming the author of the application did not deliberately modify this timestamp, it can tell you how old the sample is. Reasonably the older the sample, the more likely it will be detected by signature based ant-virus if it is malicious.
Some compilers can be identified simply by the compile timestamp. Delphi compilers will leave a timestamp of 0x2A425E19 (6/19/1992 10:22:17 PM) no matter when the application was linked. Some malware authors will zero out this timestamp or put in a fake timestamp to throw off analysts.
Figure 7. ‘file-header’.
The ‘optional-header’ contains information that was at one time completely optional, but is not mostly required for an application to execute inside a modern Windows environment. Most of this is not immediately useful for the incident responder. At the bottom of the window though we have information about ASLR, DEP, and Structured Exception Handling (SEH).
SEH is the ability of an application to handle exceptions on its own. Applications crash from time to time. This crash is called an exception. We can write our applications to execute another sub-routine if an exception were to occur during runtime. Malware authors though can use this SEH code as a mechanism to obfuscate their malicious code.
Imagine you have a program which appears to do nothing special. Just runs through maybe checks some locations on the file-system then exits. This application though has some SEH code. As an analyst, you just ignore the SEH because what do you care about exception handling? The malware author uses this complacency to hide malicious code in the SEH. The application is configured to generate an exception on its own causing the SEH to execute. Point being, that if you get into reversing malware and see an SEH, be sure to analyze that as well.
Figure 8. ‘optional-header’.
Directories and Sections
‘directories’ lists different groups of data in the PE file. The size and address would be important if you needed this information during reversing. Probably not too useful to incident responders.
‘sections’ is a useful piece of information when trying to determine if a file is malicious. The .text section will contain executable code. We can see that each of the sections has a read, write, and/or execute permission. What permission is applied to the section is denoted by an x in the appropriate field. We should expect the .text section to have Read and Execute permissions. It has code we need to execute and we must read that code in order to execute. The .text section should never have write permissions. If it did, then this means the application can actively modify itself. Equally, we should only see the .text section with execute permissions. We will also see that the .text section has the ‘entry-point’. This is the first line of executable code when the application is loaded into memory.
The other sections will contain data. Most of this data will relate to function imports, variables, and large data structures. But occasionally they will actually contain another PE file. Malware “drops” another application is called a “dropper”. Simply this malware writes another application to disk and executes the new application. The dropped application will often be stored in one of the other sections, such as the .rsrc. If you see a .rsrc section that is abnormally large, it might be a good idea to analyze that or keep an eye out for disk writes during runtime analysis.
One more thing before we move on. Section names can be anything. Windows only cares about the entry-point and the permissions on the sections. There is software called ‘packers’ which compress PE files which can decompress themselves during runtime. Packers will often change the section names. For example, the popular UPX packer renames all the sections to .UPX.
Figure 9. ‘sections’.
Libraries, Imports, and Exports
‘libraries’ and ‘imports’ help us to identify what capabilities this applcation has. During compilation and linking, the compiler/linker will lookup Windows API libraries and functions being used by application and link them into an import table. Imports are stored in Libraries. A library exports functions and other applications can import them. Make sense? I am more concerned with the imports then necessarily the actual libraries the imports come from.
‘imports’ contain the actual imported function names. You can lookup the function names on MSDN.microsoft.com to identify what they specifically do. Many have somewhat self-explainitory names. PeStudio has a list of ‘blacklisted’ imports. These are all API functions in Windows which are not malicious in their own right, but can be used to perform functions which may be considered malicious. If you do not know what a function does, which there will be plenty you do not at first, right click the function name and selcted ‘Query MSDN’. This will open a browser window and direct you to the MSDN article containing all kinds of great information about the function. You can do the same in the ‘libraries’ window as well.
Function imports can be referenced by ordinal number as well. Libraries which contain exports assign a number to each export. The author can chose to use the number rather than the name of the import. Often this is done to obfuscate what the application is importing. PeStudio is pretty good at finding the actual name of imports referenced by ordinal. At a minimum we know what the library is and we know the ordinal. We can go to that library and lookup the ordinal ourselves or do some research to find what the import actually does.
We should also be concerned if there are only a couple of imports. In Windows, all running applications have to use the API calls. It is a security mechanism which forces user-mode code to interact with the API to proxy interactions with kernel-mode APIs. Any time an application needs access to the Operating System, say to create a file or open a socket, it must interact with an API call. If you only see on library being imported ‘kernel32.dll’, then be suspicious that the file you are analyzing is actually packed. The other imports may become visible after the application is unpacked in memory.
Figure 10. ‘libraries’.
Figure 11. ‘imports’.
‘exports’ is very similar to the ‘imports’ section, but it lists functions that the PE file you are analyzing actually exports for other PE files to use. In the sample I am using, there are no exports. I analyzed ‘kernel32.dll’ just to demonstrate what exports look like.
Figure 12. ‘exports’.
TLS-Callback and Resources
‘tls-callback’ is an interesting piece of information. In Windows, a software author can use a tool called Thread Local Storage (TLS). TLS is a piece of code which executes before the entry-point. Typically this could be used to setup the environment for the application to execute. At least that is the purpose. Malware authors can use this to their advantage. By placing malicious code in a TLS callback function, this code will execute before Windows actually creates a process. The host process can fail and exit, but the TLS code is still running. My sample did not have any tls-callbacks, so no screenshot. But keep an eye out for application that do. You will have to change up your runtime analysis strategy if the sample does contain a TLS callback.
‘resources’ are typically stored in the .rsrc section. Unique UI information such as the application icon or custom window elements are stored here. Often drop files will be stored in the .rsrc section as well.
Figure 13. ‘resources’.
‘strings’ is a great source for writing Yara rules. Not that we are going to get into that now. Any string of bytes which can be read as ASCII characters is parsed and placed in this table. Most of them are going to be code which when encoded into ASCII spell giberish. There are two things we are looking for here. First we want to find readable strings such as URLs or filenames. This type of information can help point to indicators we should see after the application executes. Secondly we would be concerned if there are very few readable strings. Having a minimal number of readable strings would indicate the application is being obfuscated. Similar to the imports where we see a small number of imports.
Figure 14. ‘strings’.
I am going to skip the ‘debug’ and ‘manifest’ windows. ‘version’ though contains some more information that we can use to uniquely identify the file. This contains the information you see when you right-click on an icon and go to properties. Some interesting items in this file is we have a Copyright date of 2014, but when we looked at the compile date we see that it has a year of 2006. We also have a company name, here call Caere Corporation. We may want to lookup if this is a real company and what they do.
Figure 15. ‘version’.
Lastly I will talk about the ‘certificate’ windows, skipping ‘overlay’. If the application is digitally signed, the certificate it is signed with will be here. Most malware is not digitally signed. Or it is digitally signed with an illegitimate certificate, such as one from a certificate authority that has been compromised or a certificate for which a collision was found. Lookup the Flame malware for some interesting reads on digital signature collision . If the sample is digitally signed, validate the signature. Nine times out of ten, if the sample is digitally signed and the signature is valid, then the sample is legitimate. Do research and make your own conclusion though.
Figure 16. ‘certificate’.
Hopefully you enjoyed this overview of PeStudio. I think this is my longest post so far. Next technical post will talk about what you can do during runtime analysis after you find a file that requires further investigation. Remember we are trying to find suspicious files, conclude that it is malicious, and then develop indicators so we can conclude if it had executed. PeStudio is a powerful tool for performing some static surface analysis of a file. One which I think is a must have in every incident responders.
I also have plans to do a comparison of the standard and pro version of PeStudio once I get my pro license . I will do a follow on post about that later.
Question everything and keep moving forward!
Originally published at aubsec.github.io on September 1, 2016.