Web Shell Hunting: Meet the Web Shell Analyzer

Tstillz
Tstillz
Oct 1, 2020 · 8 min read

In continuation of my prior work on web shells (Medium/Blog), I wanted to take my work a step further and introduce a new tool that goes beyond my legacy webshell-scan tool. The “webshell-scan” tool was written in GoLang and provided threat hunters and analysts alike with the ability to quickly scan a target system for web shells in a cross platform fashion. That said, I found it was lacking in many other areas. Allow me to elaborate below…

Requirements of web shell analysis

In order to perform proper web shell analysis, we need to define some of the key requirements that a web shell analyzer would need to include. This isn’t a definitive list but more of a guide on key requirements based on my experience working on the front lines:

Static executable: Tooling must include all dependencies when being deployed. This ensures the execution is consistent and expected.

Simple and easy to use: A tool must be simple and straightforward to deploy and execute. Nothing is more frustrating than trying to get a tool to work during a live incident response engagement at 2am..

Cross platform: A majority of web servers are running on either Windows or Linux. A tool must be able to run natively on these operating systems and the tooling must be able to cross compile with ease for rapid development.

Concurrency: Tooling must be able to run across multiple CPUs and take advantage of multiple threads/channels to quickly scan a file system.

Optimized: While this is closely tied to concurrency, the tooling must take into account what system resources are available and throttle analysis to ensure system performance is not degraded.

Self-Discovering Configuration: In Live IR mode (running the analyzer on a compromised web server), a web shell analyzer should automatically determine the type of web server that is running and automatically identify and parse the web server’s configuration file. Using this data, the tool could automatically determine where the web root is located on disk, loaded handlers/filters/modules (think Windows ISAPI/HTTP filters/handlers) and other important configuration options that could enable/disable specific analyzer features.

Context: Outside of analyzing web shells, the tooling must provide context. The number one question an analyst will ask after identifying a web shell is “how did the web shell get here?”. This is why any tooling should not only be able to identify and analyze web shells, but also provide context such as:

  • Log file analysis: When a web shell is identified, an analyst would normally spend the next few cycles digging through logs attempting to see what IP(s) interacted with the web shell. Once identified, an analyst would then pivot on the IP address(s) interacting with the web shell to determine what other files/resources were accessed, GeoIP inspection and maybe some user agent analysis. Each of these fields could be a pivot point a tool should perform as part of any web shell analysis.
  • File timeline analysis: In addition to reviewing logs, the tool should quickly determine two other things. First, what are the file timestamps of the web shell, such as created or modified. Timestamps may vary based on the operating system and platform. Second, what happened ~10–15 minutes before/after the web shell was created? In some cases, this can lead to the identification of other web shells, initial ingress, harvested files or even new malware uploaded to the server.

Deobfuscation: A majority of web shells have at least some layers of obfuscation, commonly base64. However, some web shells take obfuscation to the extreme and contain multiple layers of mixed obfuscation. A tool should be able to handle the most common types of obfuscation techniques.

Layered searches: In addition to obfuscation, web shell authors commonly use layers of obfuscation to mask the source code. A tool must be able to handle multiple layers of obfuscation and perform detection checks for each decoded layer.

Attribute analysis: Simply telling an analyst that a web shell was identified and which code matched isn’t enough. After identification, the tool should provide an analyst with key attributes of the web shell; these attributes help the analyst determine the “capability” of the web shell, such as “can interface with MySQL” or “can start threads/processes”.

Detection by attributes: To expand further on attributes, a tool should be able to detect web shells solely based on attributes. This can be helpful where regex may miss detection logic but still provide a detection based on the web shell attributes.

Modular and scalable: A tool should be able to be updated with ease without frequent recompilation. In addition, new detection/attribute logic should be seamless to update. The tool should also support the ability to be scaled up/down depending on the resources available or based on the demand of the analysis. Performing real time daily web shell hunting/monitoring vs performing incident response would require two different levels of operation.

Real Time / On-Demand: A tool should be able to interface with the underlying operating system to support near real time web shell scanning against specific directories along with on-demand scanning for hunting and incident response. Most FIM (File Integrity Monitoring) tools would only provide context into file changes, but not that the changed file or content is a web shell ;).

Output: The tool should provide the analysis results in a consistent documented schema, formatted in JSON.

Transport: As multiple web servers are scanned, it makes more sense to send the analysis output to a centralized server for review. This means a tool should be able to send analysis output in a chunked, compressed and lossless fashion.

Analysis Interface: Since no-one wants to stare at JSON all day, a tool should include a user interface by which an analyst can review the output with a simple workflow that supports tagging, comments and other automated actions. The UI should be lightweight, robust and support multiple users. The UI should be backed by a documented API for further extensibility.

Webshell Analyzer

Now that we’ve reviewed some of the key requirements a web shell analyzer should include, let’s dive into my newest tool, https://github.com/tstillz/webshell-analyzer and review some of the key features included in v1:

One of the first improvements that was made in the web shell analyzer was to break down the regex into groups. Not only did this allow for more granular control over the regex but it also enabled the use of names/descriptions to classify our matching regex blocks. Detection groups are checked at each layer of decoding and include a frequency counter to show how many times the detection logic was found.

Since many web shells have nested layers of obfuscation, the analyzer is able to iterate over most layers and feed newly deobfuscated blobs back in the pipeline for processing.

As a side effect of using regex detection groups, this also enabled the tool to include “attributes”. The logic that powers these attributes tells the analyzer to “tag” a file that contains specific matching logic. These attributes tell an analyst what a detected web shell is capable of without having to perform any manual code inspection.

As with all my projects, the output is structured in JSON to make the analysis results readable and open to future structure changes as needed.

Example

In the example below, we have a simple web shell that’s been obfuscated in three different layers. Layer 1 is base64, layer two is again, base64 followed by layer 3, gzinflate. While my legacy scanner wouldn’t find this as a web shell, the newer web shell analyzer would decode and scan each layer for detections and attributes until no more decoding could be performed. While this is a very simple example, it highlights the importance of handling layers of obfuscation. Looking at the sample here:

https://github.com/tennc/webshell/blob/master/php/PHPshell/%E3%80%90c99_madnet%E3%80%91/smowu.php we can see begins with “eval(gzinflate(base64_decode(”. In order to properly process this web shell, we must first remove all the layers of “gzinflate(base64_decode(”. Normally, this is a pretty simple effort using tools like CyberChef but in this case, this web shell has 11 layers of “gzinflate(base64_decode(“. Still doable by hand but if you have to analyze dozens of web shells, it’s best to use a tool like this web shell analyzer to deal with it. The debug output below shows the web shell analyzer dealing with this web shells layers of “gzinflate(base64_decode(”:

We see that after the 11 iterations of decoding, we finally see some PHP code. From here, the analyzer can then begin processing detections and attributes on the raw PHP code. After processing, if a detection is found, the analyzer will spit out a JSON object which can be divided up into three sections: Core, Matches and Attributes. Let’s take a look at these sections below.

The core section consists of JSON key/value pairs that contain basic information about the file, its hashes, timestamps and decoders. The “decodes” item outlines which decoding routines were checked and how often. Just because a decoding routine was used doesn’t always mean it worked, only that it was attempted.

"filePath": "/tests/webshells/testers/php/PHPShell/c99_madnet/smowu.php",
"size": 44353,
"hashes": {
"md5": "3aaa8cad47055ba53190020311b0fb83",
"sha1": "ed2b47c37b9bb33bb420d33ad7258c68dec4c40c",
"sha256": "1850ac82877931f525b70421b8a9ca266e204e5065625efd5b1ab500ca87478d"
},
"timestamps": {
"birth": "2019–02–03 02:02:22",
"created": "2020–08–01 03:23:25",
"modified": "2019–02–03 02:02:22",
"accessed": "2020–08–16 17:40:53"
},
"decodes": {
"Generic_Base64Decode": 47,
"Generic_Multiline_Base64Decode": 39,
"PHP_Dot_Concatenation": 1,
"PHP_GzInflate_Base64Decode": 11
},

The matches section outlines the matches found after each level of decoding. We could also call these “detections”, as the analyzer associates these types of matches with a potential web shell. Each key in the JSON output below outlines the exact keyword that triggered a detection and how many times each keyword was found in the web shell.

"matches": {
" eval(": 4,
" passthru(": 2,
"CMD": 3,
"cmd": 86,
"fsockopen(": 8,
"netstat": 4,
"tasklist": 4
}

Unlike the matches from the section above, the attributes section is only included for grouping and contextual purposes. When a web shell is identified, the analyzer will include these attributes to highlight what “potential capabilities” the web shell possesses. In v1 of the analyzer, These attributes are not enough to trigger a detection on their own.

"attributes": {
"Generic_IP/DomainName": {
"hxxp:\\ccteam[.]ru/files/c99sh_sources/": 2,
},
"Generic_Windows_Commands": {
"CMD": 3,
},
"Generic_Windows_Reconnaissance": {
"netstat": 4,
"tasklist": 4
},
"PHP_Banned_Function": {
"exec(": 66,
"fsockopen(": 8,
"get_current_user(": 4,
"link(": 34,
"passthru(": 2,
"realpath(": 30,
"set_time_limit(": 2
},
"PHP_Database_Operations": {
"mysql_create_db(": 4,
"mysql_drop_db(": 2,
"mysql_query(": 44
},
"PHP_Defense_Evasion": {
"gzinflate(base64_decode(": 21,
"preg_replace(": 2
},
"PHP_Disk_Operations": {
"fopen(": 22
},
"PHP_Execution": {
" eval(": 4,
" passthru(": 2
},
"PHP_Network_Operations": {
"fsockopen(": 8
},
"PHP_Reconnaissance": {
"@ini_get(\"disable_functions\")": 2,
"disk_total_space(": 2,
"phpinfo(": 2,
"phpversion(": 8,
"posix_getgrgid(": 8,
"posix_getpwuid(": 10
}
}

I hope this tool is helpful and stay tuned for more updates to the web shell analyzer in the coming posts. As always, Happy Hunting!

The Startup

Get smarter at building your thing. Join The Startup’s +791K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Tstillz

Written by

Tstillz

Posting on various topics including incident response, malware analysis, development and finance/investing automation.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +791K followers.

Tstillz

Written by

Tstillz

Posting on various topics including incident response, malware analysis, development and finance/investing automation.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +791K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store