Splunk Boss of the SOC: Hunting an APT with Splunk & MITRE ATT&CK Framework (Part 1)

26 min readNov 5, 2023

Disclaimer: This is a summary of the learning workshop series on Splunk Boss of the SOC, and the purpose of drafting this write-up is twofold: to create detailed notes for quick reference and for other cybersecurity readers to learn useful threat hunting techniques without watching all videos on the platform.

Introduction

The goal of these workshop series is to expose you to different adversary techniques and demonstrate how a threat hunter can develop hypothesis to hunt for specific adversary capabilities using Splunk.

The assumption we start with when threat hunting is that the adversary is already within the organization, so the threat hunter is looking for things that are occurring and the SIEM (Security Information and Event Management) or the IDS (Intrusion Detection System) or other tools have not identified. We are working in a manner that assumes breach and we are looking to identify signs of it.

One way to shape our hunt is to apply a cybersecurity framework to our hunt. In this case, MITRE ATT&CK provides a level of abstraction from the Lockheed Martin’s Kill Chain but focusing on tactics and techniques that occur during exploit and continue post exploit. There are 16 tactics that are defined from the delivery stage forward. Under each tactics, there are numerous technique details. Additionally there are two tactics that cover a wide range of activities that occur prior to initial access on the victim’s environment. While these elements are harder to hunt for, it is possible to gain some insights here, particularly when the attacker is employing crappy tradecraft.

It is important to keep in mind that there are over 200 techniques out there and hunting does not exist in silos. You will likely run into different techniques as you conduct a hunt. Just make sure that you focus on a specific technique as you hunt to prove or disprove the hypothesis. The other thing to keep in mind while hunting is to start broadly and narrow down so that you don’t miss events. Because our data is time series, we can use the time picker in Splunk to narrow the search criteria as we hone in on specific time ranges. You may not want to hunt “ALL TIME” due to the amount of data available to you. It is also important to know that context matters. When hunting, it is important to understand the unique conditions and circumstances of a specific enterprise and those typically include both systems and users.

Scenario

We are responsible or defending Frothly. Frothly makes knock-off versions of many classic bears like Heady Popper. Sadly not everything is perfect in Frothly. Among other issues, the FBI has notified Grace Hoppy, the CEO of Frothly, of an intrusion by Taedonggang APT. Taedonggang APT is an East Asian APT that targets innovative western brewery supply companies. You are assuming the role of Alice Bluebird and she is Frothly’s security analyst. Grace is concerned about this law enforcement notification, it’s impact on Frothly and how we can understand the adversary’s actions.

Before we start diving into the Splunk database, let’s get the house-keeping laundry list out of the way. First, let’s start with data exploration. We would want to know the types of data we are working with. There are 3 ways of going about it.

(1) Available Indexes and Their Counts

| tstats count WHERE index=* by index

This search command counts the number of events for each index. This preliminary search helps you understand the volume of data in each index and provides a broad view of your environment. For this hunt, however, we will be focusing on the BOTS 2.0 dataset.

(2) Available Sourcetypes and Their Counts — Data Summary

(2) Available Sourcetypes and Their Counts — Metadata Command

|metadata type=sourcetypes index=botsv2
|eval firstTime=strftime(firstTime, "%Y-%m-%d %H:%M:%S")
|eval lastTime=strftime(lastTime, "%Y-%m-%d %H:%M:%S")
|eval recentTime=strftime(recentTime, "%Y-%m-%d %H:%M:%S")
|sort -totalCount

As mentioned earlier, context is important. Enterprise Security has assets and identifies enumerated within the Frothly environment. In the asset center, it shows all systems that are part of the asset identity framework. Similarly in the identity center, you can filter to reveal specific identities and user accounts for specific users.

Another useful tool when hunting is a network diagram. This could be very elaborate or something simple. Anything that shows connections and associations between systems can be helpful. Companies/Organizations can use the Splunk dashboarding system to upload their network diagram and create a link to it on the Enterprise Security navigational bar.

Reconnaissance Hunt

Step 1: Leverage User Agent Strings

In this phase of the hunt, we are looking for adversaries who have gotten sloppy with their tradecraft and using that sloppiness to learn more about how they might be targeting us. The recon phase of the kill chain tends to be more over the horizon or before the attack happens so it could be difficult to identify until after something has already occurred. However, if an adversary slips up, there are indicators that might be identified that could be useful in the future. Let’s hypothesize that the user agent strings may provide insight into an adversary that they may not have intended to show. User agent strings are a part of HTTP headers sent by a web browser or user agent to a web server when making an HTTP request. These strings provide information about the user agent (typically a web browser), the device, and its capabilities.

Potential Questions to Ask

What data sources (sourcetypes) are needed to view user agent strings?
When were specific user agent strings seen?
What IP addresses where user agent strings seen from?
Are any of the user agent strings anomalous? That is are there any that are excessively short or long or from systems that would be unexpected?

Known Facts

Index: botsv2

Estimated Time of Attack: Aug 2017

Company Website: www.froth.ly

index="botsv2" sourcetype="stream:http" www.froth.ly
| stats count by http_user_agent 
| sort - count

Step 2: OSINT Suspicious User Agent Strings

After scanning through these 20 search results, there’s one that jumps out as abnormal. Let’s plug it into Google and see what we can find. Naenara Browser is the DPRK’s version of Firefox that comes built into Red Star OS, the official operating system of North Korea.

Step 3: Drill Down into Suspicious UA and Identify Associated IPs

With this information in mind, we may want to dig deeper into this User Agent String on Splunk and see what we can find.

index="botsv2" sourcetype="stream:http" www.froth.ly http_user_agent="*NaenaraBrowser*"
| stats count by src_ip dest_ip

According to the search results, we have 2 internal IPs and 3 external IPs. If we cross examine our asset information with the internal IPs, we can see one of them (172.31.4.249) belongs to a user named “Kevin Lagerfield” with host name being “gacrux”. Kevin has accessed to aws, brewertalk, linux, mysql, and web. The other internal IP (172.31.6.251) can’t be found in the asset sheet, but we can safely assume that it’s the IP of our corporate website. Ideally, we should capture this information in the asset sheet for future reference.

As for the external IP addresses, we can use OSINT tools (https://whois.domaintools.com/)to see if we can find anything suspicious.

Based on our hunt, we can confirm our hypothesis that using user agent strings to tell us something about our adversary is confirmed. This alone might not be sufficient, but with additional information that might come forth later, this might corroborate nicely with your findings. Below is a graphical representation of our findings during this hunt.

Step 4: Scrutinize Public Web Visibility

Continuing the theme of recon, adversaries will often gather up information about their targets ahead of the attack, and attempt to learn as much about them as possible. This MITRE ATT&CK technique (T1593:: Search Open Websites/Domains) is on the pre- side, but might be helpful.

Why don’t we hunt our own website for adversaries accessing publicly available information. If we find anything, we may have greater insight about the sensitive information an adversary could use against us.

index="botsv2" sourcetype="stream:http" http_user_agent="*NaenaraBrowser*" 
www.froth.ly

index="botsv2" sourcetype="stream:http" http_user_agent="*NaenaraBrowser*" 
www.froth.ly http_content_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"

index="botsv2" sourcetype="stream:http" http_user_agent="*NaenaraBrowser*" www.froth.ly http_content_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
| table _time, src_ip, dest_ip, uri_path url

Based on our hunt, we can confirm that the user agent string that we’ve uncovered earlier has been browsing our corporate web assets and access a spreadsheet with company contacts (company_contacts.xlsx). We previously found that an user agent browser was performing reconnaissance that was of North Korea origin. This does not mean North Korea is attributed to this, and it means that the browser is from there. That is an important distinction. We found the IP address the browser was associated with. We found this IP address and the user agent grabbed the company contacts from the Frothly website. Depending on the content of the file, we may be able to monitor more closely the contacts that are in that file. On the other hand, if it contains the entire company directory, that just means that our adversary may have all of our employees targeted freely. We also know when this occurs (08/05/2017 1:15:49AM). It makes sense that additional attacks would be coming after this timestamp, because this certainly looks like recon activity that an adversary would take to identify potential targets within the organization.

Based on our findings, what should we be operationalizing within Frothly?

Understand the organization’s publicly facing footprint and visibility to the rest of the world (monitor for key execs who are well known and visible);
Determine if all employees need to have that same level of visibility and if not, monitor for if their information appears out there;
Pay special attention to your website and other company assets and minimize OSINT information around employees and computing assets where possible;
Seed objects in the organization including website with erroneous information that can then be monitored for (anytime we can deny or deceive the adversary, we are causing them to invest more effort into their attack against us).

Initial Access Hunt

In the initial access phase, the adversary is trying to get into your network. Initial Access consists of techniques that use various entry vectors to gain their initial foothold within a network. Techniques used to gain a foothold include targeted spearphishing and exploiting weaknesses on public-facing web servers. Footholds gained through initial access may allow for continued access, like valid accounts and use of external remote services, or may be limited-use due to changing passwords. In the following hunt, we will hypothesize that our adversary has gained their initial access through a spearphishing email attachment.

T1566.001 Phishing: Spearphishing Attachment

Possible Questions to be Asked

What data sources (sourcetypes) should we look for mail traffic in?
If we are hypothesizing about email attachments, do we have visibility into what email attachments are being received?
Are there specific kinds of attachments that we should be hunting for?
If we find attachments of interest, what attributes are associated with it? (e.g. sender, recipient, subject, message and more)
Do we see those attributes in other emails?
Are there prior spearphishing attempts that were unsuccessful that can be leveraged?

When it comes to investigating email logs, we can look to “stream:smtp” as our sourcetype. SMTP stands for Simple Mail Transfer Protocol, which is the standard protocol for sending email messages on the internet. Then we want to see the types of files received as email attachments.

index="botsv2" sourcetype="stream:smtp"

Looking across these 6 files, the one that jumps out the most is the invoice.zip file. Of course, this comes from experience. If you are unsure, you can ask ChatGPT to help narrow it down for you as well.

index="botsv2" sourcetype="stream:smtp" "attach_filename{}"="invoice.zip"
| table sender sender_email src_ip receiver dest_ip

index="botsv2" sourcetype="stream:smtp" "attach_filename{}"="invoice.zip"
| table content

We can identify the sender’s IP address as 185.83.51.21. The next step would be to OSINT the source IPs and sender’s IP and see if we can find anything to confirm our hypothesis.

There’s nothing outstanding to be drawn on from here. At this point, OSINT has been a bit of dry hose for us. Rather than focusing on the files specifically, let’s look at our indicators in a little bit of different manner. Let’s see if our sender Jim Smith has sent other emails to Frothly employees.

index="botsv2" sourcetype="stream:smtp" sender_email=jsmith@urinalysis.com
| table _time recipient subject attach_filename{} attach_size{} attach_content_decoded_md5_hash{}

index="botsv2" sourcetype="stream:smtp" sender_email=jsmith@urinalysis.com  
"attach_filename{}"="Malware Alert Text.txt"

With the work that we put in, we can say that Frothly was likely phished via an email attachment. We saw the same four recipients receiving the same email with the same sender in two forms of attachments. First was detected as a trojan and the second time did not alert which may indicate that it may have been successful.

What did we learn

Phishing was attempted twice, first attempt was unsuccessful and the second attempt succeeded.
Sender IP is 185.83.51.21
Sender name is Jim Smith at jsmith@urinalysis.com
Phishing targeted the same four recipients both times
Subject in the phishing email was Invoice
Body was identical across all four emails
Emails were sent in close proximity but individually
Attachment was the same size for each recipient

T1204.002 User Execution: Malicious File

With the discovery of that spearphishing attachment, a logical next step would be to hunt for the execution of the malicious file. Potential questions to be ask include:

What data sources (sourcetypes) should execution of files in?
Should we be looking for file executions before or after spearphishing attachments may have been received?
What kind of supporting information is found in events when a file execution occurs?
What other indicators do we have to start looking for user execution? (in this case, we know that a spearphishing attachment called invoice.zip was received)
What system did the execution occur on?
What was the user name that executed the file?
What happen upon execution of a file?

index="botsv2" invoice.zip
| stats count by sourcetype
| sort - count

As you can see, there are 5 sourcetypes referencing our spearphishing attachment (invoice.zip). We’ve already looked at “stream:smtp” and we can perhaps look into the Windows Sysmon logs as our logical next step.

index="botsv2" invoice.zip  
sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational"

With these data, we can extract that the host machine executing the file is Billy Tun’s workstation (wrk-btun). Subsequently after he downloaded the zip file, he decompressed the file into invoice.doc. What is inside the invoice.doc file and what would be the adversary’s next move? Let’s inspect the Sysmon logs chronologically (since unzipping the spearphishing attachment) and see what we can find.

index="botsv2" sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational" wrk-btun 
| reverse

We were able to tie the execution of the zip file to a specific user (Billy Tun) as well as identifying the encoded PowerShell was run on the victim’s system once the .doc was extracted from the .zip. In this case, we perform two complementary hunts and base the second findings on the first. Below is a graphical representation of the email being delivered to the user and the document being open up, as well as what’s inside the document.

Based on our findings, there are a few things we can operationalize on our Frothly network.

Prohibit use of macro enabled files (this may be an impractical one because it might impact business operations)
Monitor for the execution of macro enabled files
Apply EDR solutions that analyze, log and potentially block their execution
Alert when Sysmon or Windows Events code 4688 appears with PowerShell running

Data Staging

T1074.002 Data Staged: Remote Data Staging

Adversaries may stage data collected from multiple systems in a central location or directory on one system prior to Exfiltration. Data may be kept in separate files or combined into one file. Here are some potential questions to be answered as we conduct our hunt.

What data sets (sourcetypes) might help us identify data being staged?
What would traffic flows look like that might indicate staging of data?
what kind of data might be found where data is being staged?
Where might be some likely places data would be staged?
Is there subsequent activity to lead that data that has been staged is being exfiltrated?
What user accounts might be used to stage data?

Step 1: Investigate Office Files

In this first step, we need to start identifying data sets of interest. When discussing important office files, we think about .pdf, .doc, .xls, and potentially .tgz. “.tgz” is a file extension commonly used to indicate a compressed archive file (with the contents of multiple files and directories combined into one). Let’s take a look at the sourcetypes that reference these file types of interest.

index="botsv2" (.pdf OR .tgz OR .doc OR .xls) 
| stats count by sourcetype
| sort - count

Stream:smb is the largest sourcetype based on our search. Server Message Block (SMB) is a network protocol used for sharing files, printers, and other resources on a network. It is commonly used in Windows environments and allows for the sharing and communication between computers on a network. The idea of data files being moved across the network makes “stream:smb” an appealing place to start despite the large event count.

index="botsv2" (.pdf OR .tgz OR .doc OR .xls) sourcetype="stream:smb"
| stats count by src_ip dest_ip
| sort - count

File sharing destination IP (10.0.1.10) 1indicates that all workstations are communicating with the server named “venus”. We can also use information in the asset center to identify the workstations according to the 3 unique source IPs.

10.0.4.4 — host machine: maclory-air13, owner: mallory kraeusen
10.0.2.107 — host machine: wrk-btun, owner: billy tun
172.16.0.1 — Jupiter server??

Let’s also look at when these file sharing activities took place across the network.

index="botsv2" (.pdf OR .tgz OR .doc OR .xls) sourcetype="stream:smb"
| eval uniq=src_ip." ".dest_ip
| timechart count by uniq

Step 2: Investigate SMB Dataset

Let’s look at the first file (31564-pdf.pdf) for the search results.

index="botsv2" 31564-pdf.pdf sourcetype="stream:smb"

If you look at the flow_id field, there’s only one flow_id (d7370639–8ca9–40d3-a5f8-dd6547d4ff99) which indicates that all three events occurred within the same transaction. We can take this flow_id that has 3 events associated with it, and search for all the commands within that flow.

index="botsv2" sourcetype="stream:smb" flow_id=d7370639-8ca9-40d3-a5f8-dd6547d4ff99
| stats count by command
| sort - count

Since the “sm2 read” command has the most counts, we can look into it first and then move down the chain. We should get a sense of who is communicating with who (source and destination IPs) and the amount of data being exchanged.

index="botsv2" sourcetype="stream:smb" flow_id=d7370639-8ca9-40d3-a5f8-dd6547d4ff99  command="smb2 read"
| stats count sum(bytes_in) AS b_in sum(bytes_out) AS b_out by src_ip dest_ip
| eval mb_in=round((b_in/1024/1024),2) 
| eval mb_out=round((b_out/1024/1024),2)
| fields - b_in b_out

we know that 10.0.2.107 happens to be Billy Tun’s workstation based on information in the asset center, and it’s communicating with 10.0.1.101 which is the server venus. About 2 MB worth of data has been sent from Billy’s machine to the server whereas 1.2 G data has been sent from the server to Billy’s workstation. That sounds like a good chunk of data being placed onto the workstation. Let’s look at other commands to see if similar patterns can be spotted.

index="botsv2" sourcetype="stream:smb" flow_id=d7370639-8ca9-40d3-a5f8-dd6547d4ff99  command="smb2 create"
| stats count sum(bytes_in) AS b_in sum(bytes_out) AS b_out by src_ip dest_ip
| eval mb_in=round((b_in/1024/1024),2) 
| eval mb_out=round((b_out/1024/1024),2)
| fields - b_in b_out

The sizes of the files being created seem to be pretty small and we can assume that it is just the containers being created. Let’s look at “sm2 close” as well.

index="botsv2" sourcetype="stream:smb" flow_id=d7370639-8ca9-40d3-a5f8-dd6547d4ff99  command="smb2 close"
| stats count sum(bytes_in) AS b_in sum(bytes_out) AS b_out by src_ip dest_ip
| eval mb_in=round((b_in/1024/1024),2) 
| eval mb_out=round((b_out/1024/1024),2)
| fields - b_in b_out

It is clear that data is leaving the server and moving to Billy’s work station.

Step 3: Investigate the FTP Dataset

Circling back to our early search, we can pivot through the ftp data.

index="botsv2" 31564-pdf.pdf source="stream:ftp"

We can see that Billy’s workstation (10.0.2.107) successfully uploaded the .pdf file to 160.153.91.7. Now we are seeing exfiltration of the files via FTP. If we had done other hunts, we may have already seen these activities. If not, this might be a good trigger to start hunting for other files that may have been exfiltrated through alternative protocols such as ftp.

Step 4: Summarizing What We’ve Learned

Looking at the SMB data, we uncovered that large amount of office types of documents (mostly PDF files) moving to a workstation (10.0.2.107) from a server (10.0.1.101). Following this, we saw ftp activities on the workstation transferring data to an external server (160.153.91.7) as well. Based upon these, we can confirm our hypothesis that our adversary utilizing data staging prior to exfiltration.

Based on these findings, here are what we should be operationalizing:

Workstation to Server communication is perfectly normal, so outlier analysis may be advisable (stats command and observe bytes_in/bytes_out, event count, filenames compared to baseline behavior)
Endpoint logging to see data written to the file system could be helpful
Network logging of transfer between enclaves would be helpful as well
Wire data was the only way to see this based on our logging and SMB does not provide all the fidelity you may wish you had
Look for subsequent activity like file transfers to external addresses or writes to USB

Exfiltration via FTP

T1048.003 Exfiltration Over Alternative Protocol: Exfiltration Over Unencrypted/Obfuscated Non-C2 Protocol — FTP

In most cases, adversaries want to gain access to information extracted from an organization. Hypothesizing that exfiltration may have occurred using common network protocols such as FTP is quite reasonable. Let’s look at some potential questions that we may want to answer as we conduct the hunt:

What data sources (sourcetype) see or at least reference FTP?
What do data flows look like between sources and destinations?
Can we see commands being issued that are associated with FTP communications?
During what times do these events occur?
Are specific files being moved (up or down) with FTP?

Step 1: Examine Sourcetypes Referencing FTP

index="botsv2" ftp
| stats count by sourcetype
| sort - count

To monitor and investigate FTP-related activities, you should focus on log files and network traffic data. Therefore, let’s examine sourcetypes including suricata, stream:ftp, pan:traffic and pan:threat. Pan:traffic is typically used to index network traffic data generated by Palo Alto Networks firewalls or other network security devices. It contains information related to network traffic flows, including source and destination IP addresses, port numbers, protocols, and various metadata. Pan:threat, on the other hand, is used for indexing data related to security threats and incidents. This includes information on detected malware, intrusion attempts, and other security-related events. It may include data such as threat names, severity, source and destination IP addresses, and other threat-related details. Note that even though Windows Registry has the second largest counts, we wouldn’t examine this sourcetype because WinRegistry is often used to store configuration settings and options for the Windows operating system and various software applications. It’s not the primary location for storing information about FTP connections and activities.

index="botsv2" ftp sourcetype="suricata" 
| stats count by src_ip dest_ip
| sort - count

index="botsv2" ftp sourcetype="stream:ftp" 
| stats count by src_ip dest_ip
| sort - count

index="botsv2" ftp sourcetype="pan:threat" 
| stats count by src_ip dest_ip
| sort - count

index="botsv2" ftp sourcetype="pan:traffic" 
| stats count by src_ip dest_ip
| sort - count

Based on the suricata and stream:ftp search results, we consistently see two internal machines (Billy Tun’s and Kevin Lagerfield’s workstations) sharing data with an external IP address (160.153.91.7). According to the Palo Alto network data, however, we’ve seen slightly different results. In the pan:traffic data, we’ve seen the vast majority of traffic heading to the external interface of our firewall. Additional inspection of the traffic to learn more is warranted. The rest of the traffic data shows a similar internal external pattern as suricata and stream:ftp sourcetypes, but traffic is not just from these two workstations but from two other systems as well. Based on information on the asset center, we know that the two other systems are servers mercury (R&D server) and venus (file server).

After carefully examining the network traffic data, we can look into the Windows Sysmon data to understand the types of ftp activities being conducted over the network.

index="botsv2" ftp sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational"

Pivoting on the field CommandLine, we can see the commands being executed on these host systems (Billy’s workstation, Kevin’s workstation, venus server). 7 out of 8 times when ftp is being executed, it called the -s switch. When you run the “ftp -s” command, FTP reads the commands from the script file and executes them one by one. This can be useful for automating FTP transfers, particularly in scripting and batch processing. Therefore, seeing .dll files with this -s flag seems a bit out of the ordinary. DLL files are a type of file that contains code and data that multiple programs can use simultaneously. These files are a fundamental part of the Microsoft Windows operating system and many other software applications. The last command is very interesting because we see an open command and the value after the open command is a specific domain that we should definitely research further to understand more about. We may have uncovered an indicator that could be monitored. Because it’s a domain rather than an IP, it potentially has greater level of permanence in the attacker’s infrastructure.

Step 2: Conduct Time Series Analysis

Now that we have a sense of the traffic flow in terms of event volume, sources, destinations, and hosts, let’s take a look at the data based on time.

index="botsv2" ftp sourcetype="pan:*"  src=* dest=*
| eval uniq=src." ".dest
| timechart count by uniq

index="botsv2" ftp sourcetype="stream:ftp"   src=* dest=*
| eval uniq=src." ".dest
| timechart count by uniq

index="botsv2" ftp sourcetype="suricata" src_ip=* dest_ip=*
| eval uniq=src_ip." ".dest_ip
| timechart count by uniq

Based on the time series analysis that we’ve just run, we know that the ftp activities primarily took place between Aug 23–25. Now with this new information in mind, we can drill down our ftp search on time (Aug 23–26).

Step 3: Examine FTP activities

index="botsv2" ftp sourcetype="stream:ftp"
| table _time method method_parameter reply_content src_ip dest_ip
| sort - _time
| reverse

Based on the search results, we can see our adversary logged into the non-anonymous FTP portal using the username admin@hildegardsfarm.com. Then he/she connected to various ports to download company office files. To make it even more suspicious, there’s a .hwp file written in Korean language. A file with the HWP file extension is a document created by the South Korean company Hancom. It’s similar to Microsoft Word’s .docx format, except that it can contain Korean written language, making it one of the standard document formats used by the South Korean government.

Next, because there are two internal host machines that are referenced in the logs, we should compare activities for both sets of logs, and see if they present different behavior patterns.

index="botsv2" ftp sourcetype="stream:ftp" src_ip="10.0.2.107"
| table _time method method_parameter reply_content src_ip dest_ip
| sort - _time
| reverse

index="botsv2" ftp sourcetype="stream:ftp" src_ip="10.0.2.109"
| table _time method method_parameter reply_content src_ip dest_ip
| sort - _time
| reverse

we can see we have the exact same transactions or download activities for both internal host machines (10.0.2.109 and 10.0.2.107). We likely want to take these file based artefacts and look for execution of them post download.

index="botsv2" ftp sourcetype="stream:ftp" | search filename!=*.pdf  
| stats count by filename _time src_ip dest_ip
| sort - _time
|reverse

It may be a bit worrisome to see the file named “frothly_passwords.kdbx” is among the downloaded files, because it may mean many of the company credentials are compromised and a password overhaul may be warranted.

To investigate the post download execution of the abovementioned files, let’s take a look at event logs other than stream:ftp.

index="botsv2" sourcetype!="stream:ftp" 
(dns.py OR nc.exe OR psexec.exe OR *amd64.msi OR wget64.dll OR winsys64.dll OR *.hwp)

Let’s examine these log events host by host, starting with Kevin’s workstation, followed by Billy’s workstation and the venus server.

index="botsv2" sourcetype!="stream:ftp" (dns.py OR nc.exe OR psexec.exe OR *amd64.msi OR wget64.dll OR winsys64.dll OR *.hwp) 
dvc_nt_host="wrk-klagerf" source="WinEventLog:Microsoft-Windows-Sysmon/Operational"

index="botsv2" sourcetype!="stream:ftp"  (dns.py OR nc.exe OR psexec.exe OR *amd64.msi OR wget64.dll OR winsys64.dll OR *.hwp) 
host="wrk-btun" | sort - _time | reverse  | search source="WinEventLog:Microsoft-Windows-Sysmon/Operational"

index="botsv2" sourcetype!="stream:ftp" (dns.py OR nc.exe OR psexec.exe OR *amd64.msi OR wget64.dll OR winsys64.dll OR *.hwp) 
host=venus source="WinEventLog:Microsoft-Windows-Sysmon/Operational"

Looking at Kevin’s system, we can sysmon data with ftp.exe using -s argument with winsys64.dll. We saw that behaviour earlier. Billy’s work station shows a similar pattern, but then contains additional ftp events that appear to try to download on an incremental basis. Venus does not show anything pertaining to ftp or winsys64.dll like the other two systems. Venus is showing dns.py being executed via Python. If we look at the parent commandline, we can see that powershell has a base64 encoded value that certainly merits its own hunt. The reminder of the events on Venus appear to run the same script every 10mins also. This might be something else to look into because it would appear we have some sort of cron jobs running on our server.

Step 5: Lessons Learned & Suggested Security Measures

During this hunt, we did see ftp use for exfiltration, and it’s been seen being used on different systems. Some of these activities were based on scripted actions that we observed on Sysmon. However, it is important to note that not everything was entirely successful. We did see multiple FTPs including an open command being used. The other important thing we need to understand is that if the crown jewels were unsuccessfully extracted via FTP or extracted in another manner. At the end of the hunt, what did we learn that we did not know previously.

Two workstations, Billy’s and Kevin’s, were seen communicating to external IP 160.153.91.7 over multiple sourcetypes.
Two servers mercury and venus were only seen in Palo Alto traffic.
Adversary used the ftp command on two workstations primarily to attempt to exfiltrate data to this external IP address. The usage of a filename with a dll extension is used to obfuscate that a script is being called.
FTP events provide insight into both upload and download activity, because the same files are being downloaded to both workstations (Aug 23). On August 25, PDF files from Froth.ly are being uploaded successfully multiple times, probably because TopSecretYeast.pdf does not seem to want to be uploaded (likely blocked by firewalls).
Exfiltration traffic is destined for a domain called hildegardsfarm.com.

Based on our findings, here are some suggestions to operationalize better security in our company.

Keep an eye on the external IP address and domain that we’ve uncovered (160.153.91.7, hildegardsfarm.com)
We may want to perform some level of traffic analysis across our systems to determine what is normal traffic pattern for internal systems communicating with external systems
We may want to monitor protocols that we don’t want to see on the network. For example, if our company doesn’t corporately use FTP, maybe we should alert on its presence. Alternatively, if it is allowed, is it limited to certain users or systems. If so, monitor for exceptions and build it into the policy.
We can also monitor a specific group of files and see who is accessing them and from where. This could lead to false positive, but monitoring the crown jewels can be effective when minimal access is needed for critical documents.
Looking at the combinations of commands and arguments that don’t align could be useful when looking for attackers trying to mask their intentions. In this case, the -s switch with the .dll in the context of ftp is an outliner.

This brings us to the end of Splunk Boss of the SOC learning workshops part 1, where we’ve covered reconnaissance, initial access, data staging, and FTP exfiltration. In part 2, we will move on to discuss other exciting topics such as PowerShell Empire, lateral movement, supply chain attack detection, scheduled tasks, DNS exfiltration, account persistence, clearing logs, and adversary infrastructure. Stay tuned.

Splunk Boss of the SOC: Hunting an APT with Splunk & MITRE ATT&CK Framework (Part 1)

Written by Cindy (Shunxian) Ou