Sysmon Threat Hunting

System Monitor (Sysmon) is a Windows system service and device driver which function to monitor and log system activity to the Windows event log. Details of information it collects are process creation, network connection, and changes to file creation time. Some of Sysmon capabilities includes:

  • Logs process creation with full command line for both current and parent processes.
  • Records the hash of process image files using SHA1 (the default), MD5, SHA256 or IMPHASH.
  • Includes a process GUID in process create events to allow for correlation of events even when Windows reuses process IDs.
  • Generates events from early in the boot process to capture activity made by even sophisticated kernel-mode malware.

Before starting the analysis, the environment must be configured first. The jupyter notebook used to analyze the log doesn’t have to be in the monitored devices, as long as the Sysmon evtx file can be taken out later.

Requirements

  • A 16 GB RAM and a 64-bit x86 Intel or AMD Processor from 2011 or later. (At least 8 GB will be allocated to the monitored device virtual machine)
  • VMWare Workstation or VirtualBox.
  • Jupyter Notebook (make sure that it has pandas, json, and winevt library installed)

If multiple devices were used, logstash may be used to consolidate the log into one place. However, collecting the evtx file from each device would also works.

Setting up Environment

Before we can start exploring, we first need to set up the Sysmon environment. This steps includes installing the Sysmon and generating data for us to analyze. To install the Sysmon, simply get the installation file from here. We also use SwiftOnSecurity Sysmon config file from their github (link here). Then run the following command from command prompt:

sysmon64.exe -accepteula -i sysmonconfig-export.xml

To check whether Sysmon is installed or not, you could open windows event viewer and navigate to “Application and Service Logs > Microsoft > Windows > Sysmon > Operational”

After installing the Sysmon, we could start generating some events to be analysed later. The basic Sysmon events can be generated using SysmonSimulator.

To use SysmonSimulator, here are the steps:

  1. Download the file from github here.
  2. Get the newest executable file here.
  3. Put the downloaded SysmonSimulator.exe in the following directory: /SysmonSimulator/x64/Release
  4. Using command prompt, run the following command: SysmonSimulator.exe -all

After the process finished, event data for analysing should have been generated in the event log. However, you can add more data by running other commands on the environment. Red Canary’s Atomic Red Team provide list tests that has been mapped with MITRE ATT&CK® framework. We could check their windows tests here, and run the commands by ourselves. An example of simple tests would be the lsass dump T1003.001.

Populating dataframe

After the Sysmon evtx filled with logs from running SysmonSimulator and exploits from Atomic-red-team, the work with jupyter notebook can finally began. Analyzing data using jupyter notebook generally done by putting said data into a table, or what is called dataframe from pandas, then statistically manipulating it to get desired information.

Populating the dataframe can be done by following these steps:

Read and parse the evtx log:

import evtx
import json
evtx_file = #{Sysmon evtx file}parser = evtx.PyEvtxParser(evtx_file)
parse_json = list(parser.records_json())
events = []
for pj in parse_json:
event = json.loads(pj['data'].strip())
events.append(event)

You could see the content of the parsed event by printing one of it. It should be like:

After we parse the event log, we should choose which part of the data on each event we wanted to look at.

header = ['timestamp', 'computer_name',\
'process_path', 'parent_path',\
'command_line', 'parent_command_line',\
'user', 'sha1', 'md5',\
'sha256', 'company', 'description']
events_list = []
for evt in events:
new_evt = []
try:
new_evt.append(evt['Event']['EventData']['UtcTime'])
new_evt.append(evt['Event']['System']['Computer'])
new_evt.append(evt['Event']['EventData']['Image'])
new_evt.append(evt['Event']['EventData']['ParentImage'])
new_evt.append(evt['Event']['EventData']['CommandLine'])
new_evt.append(evt['Event']['EventData']['ParentCommandLine'])
new_evt.append(evt['Event']['EventData']['User'])
#Hash
hashes=evt['Event']['EventData']['Hashes'].split(',')
if 'SHA1=' in evt['Event']['EventData']['Hashes']:
for hsh in hashes:
if hsh[:5]=='SHA1=': new_evt.append(hsh[5:])
else: new_evt.append('')
if 'MD5=' in evt['Event']['EventData']['Hashes']:
for hsh in hashes:
if hsh[:4]=='MD5=': new_evt.append(hsh[4:])
else: new_evt.append('')
if 'SHA256=' in evt['Event']['EventData']['Hashes']:
for hsh in hashes:
if hsh[:7]=='SHA256=': new_evt.append(hsh[7:])
else: new_evt.append('')
new_evt.append(evt['Event']['EventData']['Company'])
new_evt.append(evt['Event']['EventData']['Description'])
events_list.append(new_evt)
except KeyError:
pass

If you’re not sure what data that could be read, you can scroll through some of the events to look for field to use. For this example, the field chosen are:

  • Timestamp
  • Computer name
  • Process path
  • Parent path
  • Command line
  • Parent command line
  • User
  • Hashes (SHA1, MD5, SHA256)
  • Company
  • Description

After the events data has been chosen, we could wrap it up to a dataframe.

df = pd.DataFrame(events_list, columns=header)
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S.%f')
df.head(5)

Baseline Hunting

It is important to set up a baseline in an active environment. By creating a baseline, we can review things that shouldn’t be present at our computer usage policy such as games or an unapproved installed softwares.

top_procs = df.groupby(['company', 'description'])\
.size()\
.sort_values(ascending=False)\
.reset_index(name='counts')
top_procs.head(10)

Aside from listing software, we could also find out top running processes in the environments.

top_procs = df.groupby(['process_path', 'md5'])\
.size()\
.sort_values(ascending=False)\
.reset_index(name='counts')
top_procs.head(10)

Baseline hunting could help familiarize ourselves with the environment we are working on. From those baselines, we then can determine where and what kind of threat do we want to hunt.

Structured Hunting

Structured hunting aims to identify threat based on indicators of attack (IoA) and the attacker’s tactics, techniques, and procedures (TTP). By utilizing the MITRE ATT&CK framework, threat hunters can identify threat actors during early attack stages before they do harm to the environment.

Example 1: procdump

ProcDump is a command-line utility whose primary purpose is monitoring an application for CPU spikes and generating crash dumps during a spike that an administrator or developer can use to determine the cause of the spike. Depending on the process, a dump file may contain sensitive information. Using previously generated dataframe we could check if there’s any procdump event.

df[df['command_line'].str.find("procdump")>=0]

It turns out that there are two procdump events in the log. If we check the command_line section, it is shown that the process being dumped is lsass.exe which handles local credential material.

Example 2: File creation

Aside from checking on processes, we can also look for suspicious file creation. We could do so by creating another dataframe which specifically list file creation events.

header = ['timestamp', 'computer_name',\
'process_path', 'file_name',\
'user']
file_creation_event = []
for evt in events:
if evt['Event']['System']['EventID'] == 11:
new_evt = []
try:
new_evt.append(evt['Event']['EventData']['UtcTime'])
new_evt.append(evt['Event']['System']['Computer'])
new_evt.append(evt['Event']['EventData']['Image'])
new_evt.append(evt['Event']['EventData']['TargetFilename'])
new_evt.append(evt['Event']['EventData']['User'])
file_creation_event.append(new_evt)
except KeyError:
pass
df = pd.DataFrame(file_creation_event, columns=header)
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d %H:%M:%S.%f')
df.tail(10)

On the result here we can see that there is a matching lsass_dump.dmp creation with the process we found at example one.

Based on the example shown here, there are lots of data stored inside Sysmon logs and for each type of events they might have different structures. To be able to conduct threat hunting effectively using jupyter notebook, we should understand these structures to get a better grasp of what can we look for in Sysmon logs or how do I look for certain event in the log.

--

--