Malware Detection in Web scripts with Regex, md5 checksum and PHP

Camilo Herrera
winkhosting
Published in
16 min readJan 24, 2023
Photo by Artak Petrosyan on Unsplash

Hello, in this article we are going to create a proof of concept to detect malware in scripts loaded on a web site. Keep in mind that there are several specialized tools for this kind of tasks, but if you want to know how they work in a general way, you are welcome.

Malicious scripts and malware signatures

Malicious scripts are all those that can be injected (mixed with a trusted script), uploaded to a server or web site using a known vulnerability, phishing or through poorly audited functionality on a site or system exposed to the Internet.

These scripts are as varied as you can imagine and use different mechanisms to operate and avoid detection once they are inside a server.

For this particular case we will talk about the most common web scripts (html, php, js, css, gif, jpg, jpeg and txt) in shared hosting environments and content management systems such as WordPress (technically it is not a CMS but it is used in the same way, don’t tell anyone — wink — ).

The vast majority of scripts of this type have something in common, content, text strings or source code structures that can be used to generate a signature. The signature is an element that can allow to differentiate and detect a file of interest and determine if it is malicious or not. Signatures can be generated using regular expressions, also by generating a checksum (md5 for example) of a file or even plaintext keywords that can indicate an infection.

Below we will see an example of each of the types of signatures mentioned:

Regular Expression

$regexSample = "#/\$.*\[\$.*\] \= chr\(ord\(\$.*\[\$.*\]\) \^ ord\(\$.*\[\$.* \% \$.*\]\)\)\/#";

MD5 Checksum

$md5FileChecksum = "da4b6ccd2702858d185e3ef600eeaeef";

Plain text keywords

$plainText = "Hello I am a malicious Script";

Solution structure and PHP scripts to be created

Based on these notions, we are going to build a proof of concept with PHP (obviously because it is my favorite programming language) that will go through a list of files looking for matches using the default signatures in a file and determine if some of them are malicious or not.

Important: This solution will be used via command line, it will not have a web interface.

Structure

Our proof of concept general components or functionalities will be:

  • Signature management (regular expressions, md5 checksum, plain text)

This component will be responsible for loading the signatures previously defined in a text file and making them available to the detection engine.

  • Match results reporting

It will display the necessary messages on the screen and create a log containing the results of the file review process, which will be generated in plain text.

  • Action management based on detection results

This component will allow the engine to take action upon detection of content that matches existing signatures, either move to quarantine, create a copy in a safe place, generate a warning or delete the file.

  • Match detection engine

It will be responsible for traversing a directory structure, reading the contents of each file in it and looking for matches against the available signatures.

Files and directories you will need before we get started

To get started, create the directories and files as follows in a path of your choice and where you have access to php from command line. For this POC we will use PHP 8.1.6:

/malware-scan
/quarantine
/copy
/scan-folder
sample-md5.js
sample-plain.txt
sample-regex.php
signatures.txt

File samples

In the /scan-folder directory we will have three files with scan content samples, these will be used to perform the tests. In the name of each file we can determine the type of sample.

sample-md5.js

In the sample-md5.js file save the following content without spaces or empty lines at the beginning and end:

// Function to compute the product of p1 and p2
function myFunction(p1, p2) {
return p1 * p2;
}

sample-plain.txt

In the file sample-plain.txt save the following content:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis sit amet tempus tellus. 
Praesent malesuada lobortis erat at pharetra. Sed nec semper sapien.
Vivamus consequat dapibus finibus. Pellentesque vel est ac nisi lobortis efficitur.
Ut magna lacus, ultricies vitae vulputate in, My Signature Text accumsan vitae felis.
In mollis mollis velit aliquam consectetur. Quisque eu luctus enim.
Maecenas mollis ultricies nulla. Suspendisse quis ipsum et dolor consectetur pulvinar vel sed massa.

sample-regex.php

And in the sample-regex.php file the following content:

<?php

$var[$key] = chr(ord($anothervar[$anotherkey]) ^ ord($secondvar[$thirdvar % $d]));

Signatures file

This file with the name signatures.txt will contain the signatures, type and action to be performed for each of them.

In the file save the following content without spaces or empty lines at the beginning and end:

plaintext:quarantine:My Signature Text
md5:warning:da4b6ccd2702858d185e3ef600eeaeef
regex:delete:\$.*\[\$.*\] \= chr\(ord\(\$.*\[\$.*\]\) \^ ord\(\$.*\[\$.* \% \$.*\]\)\)\

Now that we are all set, let’s implement each of the components of our proof of concept.

Pro tip: Remember that the strength in everything you do is acquired by advancing in what you aim for. Even programming and software development.

Signature management (regular expressions, md5 checksum, plain text)

To store our signatures we must define a general format for its structure, we will use a custom format in plain text and each of the signatures will have the following fields:

  • Type

This field will contain the type which can be “regex” used for regular expressions, “plaintext” for plain text words and “md5” used to compare the md5 checksum of the file with the one found in the rule.

  • Action

This field will indicate the action to take if content matching the rule is detected and can be “quarantine”, “copy”, “warning”, “delete”.

“quarantine” will move the file to a predefined directory for quarantine, “copy” will create a copy in a predefined directory, “warning” will generate a warning message on the screen and a corresponding record in the scan report, “delete” will delete the file but will not leave a copy or move to quarantine.

  • Signature

This field will contain the text corresponding to the regex expression, plain text or md5 checksum to be used to detect the content.

Below is an example of a signature in the proposed format:

#<type>:<action>:<signature>
plaintext:quarantine:My Signature Text

We will write first the type of signature, “:”, the action to perform, “:” and the text of the signature.

All the signatures will be saved in a file that will be loaded at the beginning of the execution of the script, remember that we will call it signatures.txt, do not forget to create the file in the directory. Keep in mind that there will be several items (signatures) in the same file, the format to represent them will be something like this:

signatures.txt

plaintext:quarantine:My Signature Text
md5:warning:da4b6ccd2702858d185e3ef600eeaeef
regex:delete:\$.*\[\$.*\] \= chr\(ord\(\$.*\[\$.*\]\) \^ ord\(\$.*\[\$.* \% \$.*\]\)\)\

Now we are going to create a class called SignatureManager to read the signature file, format them and prepare them to be used by the detection engine.

This class has two Backed Enums SigType and ActType, these are used to define the types of signatures (SigType) and the types of actions (ActTypes) allowed as follows:

SignatureManager.php

<?php

/**
* Allowed signature types
*/
enum SigType: string
{
case REGEX = "regex";
case PLAINTEXT = "plaintext";
case MD5 = "md5";
}

/**
* Allowed action types
*/
enum ActType: string
{
case QUARANTINE = "quarantine";
case COPY = "copy";
case WARNING = "warning";
case DELETE = "delete";
}

Now the SignatureManager class the declaration, in the constructor we will declare some variables with their default values, two functions and a getter like this:


/**
* This class loads the rules stored in the file from the path specified by
* $signatureFilePath and processes them to prepare them to be used
* by the file scan engine.
*
* TODO: check the possible steps in which an error may occur when reading the file or the lines in it, is not part of this
* article so proceed with caution.
*/
class SignatureManager
{

/**
* At this point we define the default values of the path where the signatures will be stored.
* $signatureFilePath, we initialize the $signatures array, and
* the default field separator $fieldSeparator.
*/
function __construct(
private string $signatureFilePath = __DIR__ . "/signatures.txt",
private array $signatures = array(),
private string $fieldSeparator = ":"
) {
}

/**
* This function is in charge of reading the signature file and loading it into the $signatures array.
*
* @return void
*/
public function loadSignatures(): void
{
$sigFileContents = file($this->signatureFilePath, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

foreach ($sigFileContents as $key => $sigLine) {

$loadedType = "";
$loadedAction = "";
$loadedSignature = "";

$loadedType = $this->loadField(SigType::cases(), $sigLine);
$loadedAction = $this->loadField(ActType::cases(), $sigLine);
$loadedSignature = $sigLine;

$this->signatures[$loadedType] = array("actType" => $loadedAction, "signature" => $loadedSignature);
}
}

/**
* This function processes a line of text extracted from the signature file and captures
* the values of the fields needed to be used by the detection engine.
*
* @param array $cases This array contains a Backed Enum cases list that may be
* SigType or ActType.
* @param string $textLine This parameter passed by reference contains the text
* of the line to be processed.
* @return string The captured field is returned as text.
*/
private function loadField(array $cases, string &$textLine): string
{
$retValue = "";

foreach ($cases as $enumItem) {
if (str_contains($textLine, $enumItem->value . $this->fieldSeparator)) {

$retValue = $enumItem->value;
$textLine = str_replace($enumItem->value . $this->fieldSeparator, "", $textLine);
}
}

return $retValue;
}

/**
* This getter returns the signatures array.
* @return array $signatures array Contents.
*/
public function getSignatures(): array
{
return $this->signatures;
}
}

In the constructor we define the signature file path $signatureFilePath that will point to the file named signatures.txt in the directory where the script is executed, the array of signatures $signatures where they will be saved once read and processed from the file and the field separator of each line $fieldSeparator, which in this case will be “:” based on the format we defined at the beginning.

The loadSignatures() function is responsible for reading the contents of the file, traverse each of the lines read and, using the loadField() function, detect the type of signature and the action to be performed, leaving the signature associated with these two elements. Having the three elements (Signature, Type and Action) the rule is saved in the $signatures array with the following format:

$this->signatures[<signature type>] = array("actType" => <action>, "signature" => <signature>);

Finally the getter getSignatures(), will be used so that instances of other classes can get the list of signatures ready to be used.

Match results reporting

This component is quite simple and has two responsibilities, the first one is to display messages on screen for the actions performed by the detection engine and the second one is to save in an event log file each of these messages.

Let’s start with the creation of the file for the class in charge of these tasks. We will call it Reporter.php. You must create it in the root of the directory where you are working in your project, in this case “/malware-scan”.

Once we have the file, we are going to create the class with the same name inside it, as follows:

Reporter.php

<?php

class Reporter
{
//path where the log file will be saved.
private string $reportFilePath;

/**
* Here we define the default value for the save path $reportFilePath
* or we can customize it by sending a new path as the constructor parameter.
*/
function __construct(string $reportFilePath = "")
{
if (empty($reportFilePath)) {
$this->reportFilePath = __DIR__ . "/report_" . date("Y-m-d_H-i-s") . ".log";
}
}

/**
* This function will present on screen the text stored in $messageText
* and also will save it to the file specified by $reportFilePath.
* @param string $messageText Message to be displayed and saved in the log.
* @param bool $noDate Allows to enable/disable the date, time and
* milliseconds at the beginning of the message to be displayed,
* it will also be added to the log, by default false which
* indicates that it is always added, or true to disable it.
* @return void
*/
public function echoMessage(string $messageText, bool $noDate = false): void
{
$echotext = $messageText . PHP_EOL;

if (!$noDate) {
$echotext = "[" . date("Y-m-d H:i:s.u") . "] " . $echotext;
}

echo $echotext;
file_put_contents($this->reportFilePath, $echotext, FILE_APPEND | LOCK_EX);
}
}

As you can see in the code, the behavior of this class was pretty basic, display and save text in a log file, the files are created in the root of the directory where your scripts are executed and its name will correspond to the date and time of execution.

We will see its use when implementing the detection engine.

Action management based on detection results

This component will be responsible for performing operations on files based on the action indicated by the signature or rule detected.

Let’s start as in the previous sections, we will create the file for the class and call it Actuator.php, in it we will create the Actuactor class.

Important: This class requires two default directories inside the directory where the scripts are executed, these are “quarantine” and “copy”, in them will be stored the files for the quarantine and copy actions respectively.

Our class will look like this:

Actuator.php

<?php

class Actuator
{
private string $quarantinePath;
private string $copyPath;

/**
* In the constructor, we define the default values for the quarantine and copy paths.
*/
function __construct()
{
$this->quarantinePath = __DIR__ . "/quarantine";
$this->copyPath = __DIR__ . "/copy";
}

/**
* this function is responsible for performing the action requested in the $action parameter.
* This parameter is of type ActType, which is the backed enum that contains the available actions to be performed.
*
* @param ActType $action Case inside the backed enum ActType that indicates the action to be performed. E.g. ActType::QUARANTINE
* @param string $filePath Path of the file to which the action will be applied.
* @return string Action result message to be displayed.
*/
public function performaAction(ActType $action, string $filePath): string
{
/*
Warning: in this line we apply a bit of php magic, and we make the call to the action
from the value of the case, in other words, what we do is to call the method
inside the class with the text string of the action.
E.g. $this->{"quarantine"}($filePath);
In this way we avoid doing a switch-case or an endless if-else if we implement
more actions in the future.

We punish readability for less code and time spent. I think that in these cases
is worth the risk.
*/
return $this->{$action->value}($filePath);
}

/**
* Quarantine action.
*
* @param string $filePath The path to the file to which the action will be applied.
* @return string Result message of the action, it will be used to
* display on screen and save to the log.
*/
private function quarantine(string $filePath): string
{
rename($filePath, $this->quarantinePath . "/" . basename($filePath));
return $filePath . " moved to quarantine folder " . $this->quarantinePath;
}

/**
* Copy action.
*
* @param string $filePath The path to the file to which the action will be applied.
* @return string Result message of the action, it will be used to
* display on screen and save to the log.
*/
private function copy(string $filePath): string
{
copy($filePath, $this->copyPath . "/" . basename($filePath));
return $filePath . " copied to folder " . $this->copyPath;
}

/**
* Warning action
*
* @param string $filePath The path to the file to which the action will be applied.
* @return string Result message of the action, it will be used to
* display on screen and save to the log.
*/
private function warning(string $filePath): string
{
return "WARNING!: Detected suspicious content on " . $filePath . PHP_EOL;
}

/**
* Delete action
*
* @param string $filePath The path to the file to which the action will be applied.
* @return string Result message of the action, it will be used to
* display on screen and save to the log.
*/
private function delete(string $filePath): string
{
unlink($filePath);
return $filePath . " Deleted." . PHP_EOL;
}
}

As you can see, it contains a function for each available action, these functions return a message that will be used by our reporting component to display it on screen and save it.

The class also defines in the constructor the default paths to use for copying and quarantining files.

Match detection engine

The detection engine has only one responsibility and that is to iterate the content within the directory that is designated to be checked and search the contents of each file for each of the loaded signatures.

To do this we are going to create a class called ScanEngine in its corresponding ScanEngine.php file.

Our class will have two properties, the path of the directory to be verified $scanPath, which by default will point to a directory called “/scan-folder” within the same directory where our scripts are executed and in it there are three samples with the signatures we have, the second property is an array containing the extensions of the file types of interest, $allowedFileTypes.

The extensions we will use in our proof of concept will be:

  • html
  • php
  • js
  • css
  • gif
  • jpg
  • jpeg
  • txt

These are the most common web file extensions affected by malware in shared hosting environments.

Our class will also have four functions: run() in charge of executing the scan and three more functions, one for each type of signature (regex, plain text and md5 checksum). These three remaining functions return true/false depending on whether the searched content is detected.

The final script for our detection engine will look like this:

ScanEngine.php

<?php

class ScanEngine
{
private string $scanPath;
private array $allowedFileTypes;


/**
* In the constructor we define the default values of the directory to be scanned and can be
* customize by sending a new path in the $scanPath parameter when instantiating the object.
*
* Additionally we define the file types of interest to scan.
* @param string $scanPath Path to scan.
*/
function __construct(
string $scanPath = __DIR__ . "/scan-folder"
) {
$this->scanPath = $scanPath;
$this->allowedFileTypes = array("html", "php", "js", "css", "gif", "jpg", "jpeg", "txt");
}

/**
* Perform the scan on the path defined by $scanPath.
* @return void
*/
public function run(): void
{

$reportMan = new Reporter();

$reportMan->echoMessage("Malware scan started!");
$reportMan->echoMessage("", true);
$reportMan->echoMessage("Scanning Directory " . $this->scanPath);

$sigMan = new SignatureManager();
$sigMan->loadSignatures();

$signatures = $sigMan->getSignatures();

$actuator = new Actuator();

$dirIterator = new \RecursiveDirectoryIterator($this->scanPath);
/** @var \RecursiveDirectoryIterator | \RecursiveIteratorIterator $it */
$it = new \RecursiveIteratorIterator($dirIterator);

while ($it->valid()) {
if (!$it->isDot() && $it->isFile() && $it->isReadable()) {

// $file is a SplFileInfo instance
$file = $it->current();
$filePath = $it->key();

if (in_array($file->getExtension(), $this->allowedFileTypes)) {

$reportMan->echoMessage("", true);
$reportMan->echoMessage($filePath);

foreach ($signatures as $sigType => $sigInfo) {

$flagDetected = false;

if (file_exists($filePath)) {
if ($sigType == SigType::PLAINTEXT->value) {
$fileContent = file_get_contents($filePath);
$flagDetected = $this->checkPlainText($fileContent, $sigInfo["signature"]);
}

if ($sigType == SigType::REGEX->value) {
$fileContent = file_get_contents($filePath);
$flagDetected = $this->checkRegex($fileContent, $sigInfo["signature"]);
}

if ($sigType == SigType::MD5->value) {
$flagDetected = $this->checkMd5($filePath, $sigInfo["signature"]);
}
}

if ($flagDetected) {
$reportMan->echoMessage("(" . $sigInfo["actType"] . ") " . $sigType . " Signature [" . $sigInfo["signature"] . "] detected! ");

$actionResult = $actuator->performaAction(ActType::from($sigInfo["actType"]), $filePath);
$reportMan->echoMessage($actionResult);
}
}
}
}

$it->next();
}

$reportMan->echoMessage("", true);
$reportMan->echoMessage("Malware scan ended!");
}

/**
* Checks if the plaintext signature is found in the content of the string $fileContent
* @param string $fileContent Contents to be searched for signature matches
* @param string $signature Plain text of the signature to be searched for in the string $fileContent
* @return bool True=found, False=not found
*/
private function checkPlainText(string $fileContent, string $signature): bool
{
$retVal = false;
if (str_contains($fileContent, $signature)) {
$retVal = true;
}
return $retVal;
}

/**
* Checks if the regex signature is found in the content of the string $fileContent
* @param string $fileContent Contents to be searched for signature matches
* @param string $signature Plain text of the signature to be searched for in the string $fileContent
* @return bool True=found, False=not found
*/
private function checkRegex(string $fileContent, string $signature): bool
{
$retVal = false;

$res = preg_match("#" . $signature . "#", $fileContent);

if ($res == 1) {
$retVal = true;
}

return $retVal;
}

/**
* Checks if the file's md5 checksum matches the checksum in the signature
* @param string $filePath Path of the file
* @param string $signature md5 checksum to compare with the one generated for the file
* @return bool True=found, False=not found
*/
private function checkMd5(string $filePath, string $signature): bool
{
$retVal = false;

if ($signature == md5_file($filePath)) {
$retVal = true;
}

return $retVal;
}
}

Those would be all our component scripts, now let’s create a file that will be the entry point to start a detection process.

Scan execution input script

Our input script will be as follows:

scan.php

<?php
//This variable will store the "path=" argument received from the command line.
$pathToScan = "";

foreach ($argv as $key => $argument) {
if (str_starts_with($argument, "path=")) {
$pathToScan = str_replace("path=", "", $argument);
//windows fix
$pathToScan = str_replace("\\", "/", $pathToScan);
}
}

include("SignatureManager.php");
include("ScanEngine.php");
include("Reporter.php");
include("Actuator.php");

$scanEng = new ScanEngine($pathToScan);
$scanEng->run();

As it is designed, our example runs a default scan in the /scan-folder directory inside the root of the directory where our scripts are located. But if you send the “path=” argument on the command line, it will use the path in it to perform the scan.

This also includes our classes, creates an instance of ScanEngine and starts the process by invoking the run() function;

Final file and directory structure

For our proof of concept to work. The final file and directory structure of your project should look like this:

/malware-scan
/quarantine
/copy
/scan-folder
sample-md5.js
sample-plain.txt
sample-regex.php
Actuator.php
SignatureManager.php
ScanEngine.php
Reporter.php
scan.php
signatures.txt

Already having all our files, it only remains to perform a test, let’s see the result.

Scan test

To perform a test, start a command console and run the scan.php script as follows:

# Default directory scan /scan-folder
php scan.php

# Custom directory scan
php scan.php path=/home/my_malware_folder

The result of the scan will be displayed on the console and saved in the corresponding log file, as follows:

[2023-01-23 14:57:01.000000] Malware scan started!

[2023-01-23 14:57:01.000000] Scanning Directory /malware-scan/scan-folder

[2023-01-23 14:57:01.000000] /malware-scan/scan-folder\sample-md5.js
[2023-01-23 14:57:01.000000] (warning) md5 Signature [1baa5c73942b260cbe22c4effa8b5cda] detected!
[2023-01-23 14:57:01.000000] WARNING!: Detected suspicious content on /malware-scan/scan-folder\sample-md5.js

[2023-01-23 14:57:01.000000] /malware-scan/scan-folder\sample-plain.txt
[2023-01-23 14:57:01.000000] (warning) plaintext Signature [My Signature Text] detected!
[2023-01-23 14:57:01.000000] WARNING!: Detected suspicious content on /malware-scan/scan-folder\sample-plain.txt

[2023-01-23 14:57:01.000000] /malware-scan/scan-folder\sample-regex.php
[2023-01-23 14:57:02.000000] (warning) regex Signature [\$.*\[\$.*\] \= chr\(ord\(\$.*\[\$.*\]\) \^ ord\(\$.*\[\$.* \% \$.*\]\)\)] detected!
[2023-01-23 14:57:02.000000] WARNING!: Detected suspicious content on /malware-scan/scan-folder\sample-regex.php

[2023-01-23 14:57:02.000000] Malware scan ended!

On screen you can view the result of the scan and files that have matches with the existing signatures and you will also find a new .log file in the root of the directory containing the scan report.

And this would be the end of the project, thank you for reaching this point. Remember that in Winkhosting.co we are much more than hosting.

--

--