Accelerating Forensic Triage with Splunk
Responding to incidents across enterprises can raise several issues, one of the biggest challenges response teams are faced with today surrounds data acquisition. This can hinder progress with the investigation, ultimately increasing the time difference between the time of detection and time of containment.
As defined by NIST, the Digital Forensics process consists of four main process areas, namely, the acquisition stage where artefacts are collected from the affected systems; the examination and analysis stages related to parsing and understanding the evidence collected from the acquisition stage; and finally reporting on the findings observed.
As seen throughout the industry, data acquisition can hinder investigations due to a number of reasons, including lack of physical access, access rights etc. In addition to this, responders may have their own tools and techniques for acquiring and triaging data, preventing forensic rigour being followed while acquiring the data and potentially contaminating a volatile data source.
Based on research and development work, this post will provide an insight into how Splunk can be utilised for automating the collection of key endpoint artefacts and presenting them for analysis to aid forensic investigations.
Project Aims
This project has been designed to combat the following issues identified:
- Speed up the acquisition of forensic artefacts on Windows devices
- Ensure forensic rigour is maintained during the acquisition phase
- Maintain consistency when parsing artefacts/evidence and enforce the use of only validated acquisition and parsing tools
- Provide a single platform to correlate and analyse data collected during the investigation
- Provide a means of configuring pre-defined alerts that can trigger based on intelligence gathered from previous investigations or threat intelligence
Why Splunk?
During a forensic investigation, there are usually a number of different data sources interrogated to understand the issue at hand ranging from OS, application and appliance logs to the output of forensic tooling where each data source generally comes in a different format. Taking advantage of Splunk’s schema on the fly architecture, all of the event logs and the output from the forensic tooling can be ingested into and searched across in Splunk with ease, making it a suitable platform for this project.
Alerts set up in Spunk provide the option to carry out a series of actions upon triggering, with one of these options being to run a script. By utilising the script function, we can automate the deployment of an app to the endpoints flagged by the alert from the app, while containing the required scripts to carry out a full forensic triage of the endpoint. Automating the collection of artefacts this way validates that forensic rigour is followed each time the triage is carried out during the acquisition phase.
Indicators of compromise identified from previous investigations or acquired from threat intelligence can be used to power reports and alerts to highlight specific indicators contained inside the forensic data. For example, identifying base64 encoded PowerShell commands being launched based on recent command line history, or identifying generic Windows processes which wouldn’t normally reach out to the internet spawning connections to the internet.
Data sources can be correlated to create a better picture of the incident, for example, using the data collected from the Live Forensics part of the acquisition, parent-child relationships between processes can be mapped to show how certain processes were executed. This data can then be mapped to network connections, showing which processes opened certain connections or ports.
How does it work?
The project’s main goal is to automate the collection and parsing of key forensic artefacts to aid forensic investigations. This automation piece of the project relies on the Splunk infrastructure using a Deployment Server for forwarder management.
A Deployment Server (DS) is used to distribute content or configurations to deployment clients, grouped into server classes. In the context of this project, the deployment clients are the Universal Forwarders deployed across the Windows estate collecting endpoint logs. Server Classes are configured on the deployment server to group configuration files together and to specify which deployment clients they are rolled out too.
In order to automate the deployment of the forensic triage app, a serverclass on the deployment server needs to be updated with the hostname of the device flagged by the alert. As shown below in the code snippet, a script has been developed so that when applied to a saved search as an alert action it will parse the dest field for the hostname. This is so that the script can then update the serverclass on the deployment server.
def process_event(helper, *args, **kwargs):
helper.log_info("Alert action start_forensic_triage started") #Get field from alert
events = helper.get_events()
for event in events:
host = event.get("dest")
helper.log_info('event.get("hostname")={}'.format(dest))
check_serverclass()
The script will then check to see if the serverclass exists on the DS or not. If it doesn’t exist a new one will be created and the hostname acquired earlier will be added to the serverclass whitelist, as shown in the code snippet below.
def add_serverclass():
# Append serverclass template to the bottom of the file
with open("$SPLUNK_HOME/etc/system/local/serverclass.conf", "a") as file:
file.write("[serverClass:ta-digitalforensics]")
file.close()
logging.info("Creating serverclass")
add_whitelist()
TA-DigitalForensics is an app comprising of a series of forensic triage scripts where the triage scripts would be invoked via a scripted input upon installation on an endpoint. The triage scripts have been developed so that each artefact type is written into a separate JSON file to make the ingestion into Splunk easier. Inside the app, an inputs.conf has been created to instruct the UF to monitor specific file paths so that the JSON data is forwarded to the indexing tier under the correct source type and index.
Below is a configuration sample taken from inputs.conf showing the triage.py enabled as a scripted input so once this has been installed on an endpoint this script will be launched in turn invoking the entire collection of triage scripts. As previously mentioned each artefact is written to a separate JSON file therefore there is an input stanza for each artefact collecting the data into the digital_forensics index across a series of sourcetypes.
[script://$SPLUNK_HOME/etc/apps/TA-digitalforesnsics/bin/triage.py]
sourcetype = scripted_input
disabled = 0
interval = 30[monitor://$SPLUNK_HOME/etc/apps/TA-digitalforesnsics/bin/browser/chromeDownloads.json]
disabled = 0
sourcetype = browser
source = chromeDownloads.json
index = digital_forensics
Data Sources
The data sources chosen for this project cover a broad range of areas in an attempt to sufficiently collect enough data for an analyst to verify and investigate whether an event was malicious or not. These data sources provide coverage and insight into active network connections, running processes, file access and deletion, process information and browser activity. To begin with, the live forensics process is followed to collect key volatile information before it is lost from the system prior to moving on to collect additional artefacts stored on the hard disk.
Live Forensics Process
This is the process of acquiring volatile information stored in RAM, which would be otherwise lost if the system is powered off or could be potentially overwritten if the system is left running.
The data acquired from the live forensic process is not only quickly acquired but also very valuable. It is commonly used to identify indicators of installation, persistence and established C&C communications.
Again since the process is fully automated, the collection mechanisms can be validated to make sure that the evidence collected is accurate and the collection method doesn’t contaminate any other evidence. In addition to this, the order of volatility is enforced as some data sources which are more volatile than others are collected first.
Data collected via the Live Forensics process includes the following:
- Currently logged on users
- Established network connections
- Open and listening port information
- Recent DNS queries made
- Information about all running processes
- Information about all open files
- Command-line history
- List of scheduled tasks
- Contents of the clipboard
Browser Usage
An attempt will be made to collect browser data from three of the most popular web browsers: Internet Explorer/Edge, Google Chrome and Firefox. From each browser the browsing history will be acquired to provide a list of all websites visited by date and time, with a count of times visited. The download history is also collected, showing all files downloaded by date and time and showing the size of the file, the type of file, the name of the file and where it was downloaded from. From Firefox and IE exclusively, all search terms entered into the browser are also recorded so these are also collected by date and time.
In addition to the above, Internet Explorer also records information not related to web browsing activity. It also records access to files and applications which reside on network shares.
Program Execution
Information regarding commonly used and recently used executables are collected from a number of different data sources including Prefetch files, Shimcache and Jump Lists. Artefacts collected relating to program execution can provide a good indication of when the initial compromise/infection took place. This is done by tracking the first launch time of an executable, providing a key pivot point to begin uncovering where the executable came from.
From prefetch files the first and last execution times, the number of times the file was executed, the location it was executed from and any file handles related to the executable are collected. Jump Lists provide users with a graphical indication of recent items accessed by each application in the taskbar, so parsing this data provides the list of recently accessed files by each application by date and time. The shimcache is used by Windows to identify application compatibility issues by recording details about program execution, so from a forensic point of view the file path, size, last modified time and any execution flags are acquired from this cache.
Deleted Files
Throughout its existence, the role of the Recycle Bin is to track files that have been deleted by the user, whether through interaction in Explorer or another programme on the endpoint. By analysing the contents of the Recycle Bin, we can identify all files deleted, when they were deleted and where they were deleted from.
This information is especially useful as an attacker attempting to clean up after themselves may delete binaries, config files etc once they have gained an initial foothold inside the network. For example, a lot of multi-stage malware variants will be configured to delete the dropper and any other associated files upon execution.
File/Folder Access
LNK Files are created by the Windows OS to provide a shortcut to an application or file throughout a system. These files are also parsed to show the first and last time the particular file was accessed along with the files original location (on a network share drive, external device or local system). These can be useful for tracking files which may have deleted, stored on a USB or network share, so although the file might no longer be there, the LNK files associated with the original file will still exist.
Conclusions
In this post, an approach to automatically acquiring and triaging Windows forensics artefacts utilising the Splunk platform has been presented, where alerts generated in Splunk cause a script to update a stanza on the deployment server to push out and initiate a forensic triage app to Windows endpoints.
As with any security, investigation time is of the essence, therefore, having the ability to start collecting forensic evidence for high fidelity alerts before an analyst has had a chance to carry out initial triage on the event can be a huge time-saver. Doing so also provides traditional tier-one analysts more information to work with while carrying out their investigation before escalating the event further. Or even in a managed security operations centre where analysts rely completely on the data in their SIEM solution, having the ability to enrich their investigation with endpoint forensic data means that they can do more work to verify that the event is a True Positive and provide a more detailed explanation of their findings when escalating for further analysis.
The acquisition and analysis aspect has been developed in such a way that it is completely extendable, meaning that new features to collect additional artefacts can easily be added or existing features can be customised to be used in different environments. For example, a common parsing tools used throughout the industry such as Hoarder or KAPE which can be deployed within the app and the output ingested into Splunk for analysis.